biblio fingerprint should distinguish between elements contributing to the fingerprint

Bug #1528901 reported by Galen Charlton
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned

Bug Description

Consider the movie "Blue steel" and the book "Blue" by Danielle Steel. With typical MARC cataloging of these titles and the default seed data for config.biblio_fingerprint, the same bib fingerprint will be generated: "bluesteel".

This has the effect of putting both titles on the same metarecord, which would lead to confusing results when doing metarecord searches or using advanced hold options in the public catalog.

Some ways that this problem could be addressed:

- change the stock author component of the fingerprint so that it includes all words, not just the first word. This will reduce the chance of mismatches, but can also result in cases where minor differences in how an author's given names are catalogued
- adjust the fingerprint so that a special separator character is used to distinguish between fields contributing to the fingerprint. That special character could be as simple as a space, e.g. "blue steel" would mean title=blue, author=steel, as opposed to "bluesteel" (title normalizes to "bluesteel", no individual contributor is cataloged).
- solve the general problem of assigning work identifiers

Option 3 is... ambitious. I personally have a slight preference for option 2, but option 1 should be considered as well.

Galen Charlton (gmc)
description: updated
Revision history for this message
Rogan Hamby (rogan-hamby) wrote : Re: [Bug 1528901] Re: biblio fingerprint should distinguish between elements contributing to the fingerprint
Download full text (3.6 KiB)

I like option 2. It would address the underlying issue the majority (if
not all) of the times I have seen it.

On Wed, Dec 23, 2015 at 3:54 PM, Galen Charlton <email address hidden> wrote:

> ** Description changed:
>
> Consider the movie "Blue steel" and the book "Blue" by Danielle Steel.
> With typical MARC cataloging of these titles and the default seed data
> - for config.biblio_fingerprint, the same bib finger print will be
> + for config.biblio_fingerprint, the same bib fingerprint will be
> generated: "bluesteel".
>
> This has the effect of putting both titles on the same metarecord, which
> would lead to confusing results when doing metarecord searches or using
> advanced hold options in the public catalog.
>
> Some ways that this problem could be addressed:
>
> - - change the stock author component of the fingerprint so that it
> includes all words, not just the first word. This will reduce the chance
> of mismatches, but can also result in cases where minor differences in how
> an author's given names are catalogued
> + - change the stock author component of the fingerprint so that it
> includes all words, not just the first word. This will reduce the chance
> of mismatches, but can also result in cases where minor differences in how
> an author's given names are catalogued
> - adjust the fingerprint so that a special separator character is used
> to distinguish between fields contributing to the fingerprint. That
> special character could be as simple as a space, e.g. "blue steel" would
> mean title=blue, author=steel, as opposed to "bluesteel" (title normalizes
> to "bluesteel", no individual contributor is cataloged).
> - solve the general problem of assigning work identifiers
>
> Option 3 is... ambitious. I personally have a slight preference for
> option 2, but option 1 should be considered as well.
>
> --
> You received this bug notification because you are subscribed to
> Evergreen.
> Matching subscriptions: evergreenbugs
> https://bugs.launchpad.net/bugs/1528901
>
> Title:
> biblio fingerprint should distinguish between elements contributing to
> the fingerprint
>
> Status in Evergreen:
> New
>
> Bug description:
> Consider the movie "Blue steel" and the book "Blue" by Danielle Steel.
> With typical MARC cataloging of these titles and the default seed data
> for config.biblio_fingerprint, the same bib fingerprint will be
> generated: "bluesteel".
>
> This has the effect of putting both titles on the same metarecord,
> which would lead to confusing results when doing metarecord searches
> or using advanced hold options in the public catalog.
>
> Some ways that this problem could be addressed:
>
> - change the stock author component of the fingerprint so that it
> includes all words, not just the first word. This will reduce the chance
> of mismatches, but can also result in cases where minor differences in how
> an author's given names are catalogued
> - adjust the fingerprint so that a special separator character is used
> to distinguish between fields contributing to the fingerprint. That
> special character could be as simple as a space, e.g. "blue steel" would
> mean...

Read more...

Revision history for this message
Kathy Lussier (klussier) wrote :

+1 to option 2 from me

Kathy Lussier (klussier)
Changed in evergreen:
assignee: nobody → Kathy Lussier (klussier)
Kathy Lussier (klussier)
tags: added: fingerprint metarecords
Revision history for this message
Galen Charlton (gmc) wrote :

Now writing a patch for this. Because adjusting bib fingerprints should be accompanied by adjusting the metarecord mapping, I'm tackling bug 1488655 as part of this.

Changed in evergreen:
assignee: Kathy Lussier (klussier) → Galen Charlton (gmc)
Revision history for this message
Galen Charlton (gmc) wrote :

A patch is available at the tip of the user/gmcharlt/lp1528901_distinguish_fps branch in the working/Evergreen repository:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/gmcharlt/lp1528901_distinguish_fps

Note that the update depends on the fix for bug 1488655.

Changed in evergreen:
milestone: none → 2.next
status: New → Confirmed
tags: added: pullrequest
Revision history for this message
Galen Charlton (gmc) wrote :

Note that I'm targeting this to 2.next intentionally; while this patch is just a bugfix, its schema update does end up touching all bibs.

Revision history for this message
Rogan Hamby (rogan-hamby) wrote :

Tested and looks good, pushing sign off now.

Revision history for this message
Kathy Lussier (klussier) wrote :
Kathy Lussier (klussier)
Changed in evergreen:
assignee: Galen Charlton (gmc) → nobody
milestone: 2.next → 2.12-beta
Revision history for this message
Kathy Lussier (klussier) wrote :

Looks good to me too. I added my signoff, but I didn't merge it yet because I would like it to go on around the same time as bug 1553287, which does not yet have a 2nd signoff. My signoff branch is available at:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/kmlussier/lp1528901_more_precise-fingerprinting_signoff

I didn't add a release notes entry because I think it would make more sense to add on to the entry from bug 1553287.

Changed in evergreen:
assignee: nobody → Mike Rylander (mrylander)
Revision history for this message
Mike Rylander (mrylander) wrote :

It is done! Committed to master. Thanks, Galen, et al!

Changed in evergreen:
assignee: Mike Rylander (mrylander) → nobody
status: Confirmed → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.