Discoverability: apply rel=canonical to record detail and library pages

Bug #1406451 reported by Dan Scott
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Undecided
Unassigned

Bug Description

* Evergreen master and all TPAC versions

Search engines use the "<link rel=canonical>" convention to determine that, no matter how many variations on a URL you might generate through the likes of appending &query= parameters, etc, those URLs are all the same as the value of the href attribute in the <link> element.

A simplistic approach for record detail pages would be to simply trim records down to their record ID and throw away all query params. For the purpose of holdings, it might be more accurate to include the org_unit params. As that would make the number of possible URLs = # of records * number of OUs (that is, a huge ballooning in the number of URLs we would expect search engines to index), I'm tempted to avoid that for a first iteration.

Library pages are a little easier because we don't have to worry about query params, but we do have to choose between making either the shortname or the numeric ID the canonical version. As the holdings link to the shortname version of the library URL, that's what I'm tempted to go with.

A third consideration is that some consortial sites use different hostnames for different parts of their consortium (e.g. we have laurentian.concat.ca for Laurentian and algoma.concat.ca for Algoma). I suspect we'll want to allow URLs to be canonical per hostname (e.g. https://laurentian.concat.ca/eg/opac/record/666 and https://algoma.concat.ca/eg/opac/record/666 should be kept separate, as the likely difference is branding--which means little to a search engine, but everything to libraries who want their users to land on their branded version of a record page, not a partner university's version of a record page.

Tags: pullrequest
Revision history for this message
Dan Scott (denials) wrote :

Please see http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dbs/lp1406451_rel_canonical for an initial simple approach to providing rel=canonical declarations for library and record pages.

I haven't addressed the library numeric ID vs. shortname, because upon further reflection I think the best way to handle that would be to issue a 301 that redirects from the numeric ID to the shortname and should be the subject of a separate bug.

tags: added: pullrequest
Changed in evergreen:
milestone: none → 2.next
Dan Scott (denials)
Changed in evergreen:
milestone: 2.next → 2.8-beta
Revision history for this message
Dan Scott (denials) wrote :

Added another commit to put '<meta name="robots" content="noindex">' into the <head> of search results pages, call number browsing, general browsing, the advanced search page, and lists. Because these are ultimately unlikely to be helpful to search engines or users.

Note that even though you might block these pages via robots.txt, search engines will still index them if they get links to the pages from elsewhere--such the subjects in our record pages--per https://support.google.com/webmasters/answer/6062608?hl=en

Revision history for this message
Dan Scott (denials) wrote :

BTW, we're running this in production since mid-January. No ill effects reported as of yet.

Revision history for this message
Ben Shum (bshum) wrote :

Pushed to master for 2.8 beta. Thanks Dan!

Changed in evergreen:
status: New → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.