Did you mean: diacritics cause erroneous search suggestions, resulting in no hits

Bug #1931625 reported by Michele Morgan
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Evergreen
New
Undecided
Unassigned

Bug Description

Search suggestions are not derived properly when there are diacritics.

In a concerto database, records exist with the following terms with diacritics:

Bartók, Béla
Dohnányi, Ernst
Konzertstück
Élegie

In the marcxml, the entries for these terms are:

Bartók, Béla,
Dohnányi, Ernst
Konzertstück
Élegie

Searching using the following keyword search terms offer the following suggestions:

Search term - Suggestion

bartock - bart
dohnini - dohn
konzertstock - konzertst
alegie - legie

These suggestions lead to no hits

Revision history for this message
Mike Rylander (mrylander) wrote :

At first blush, that looks like broken marcxml content in concerto. Those aren't UTF8 characters, but XML entity encoded Latin-1 code page values. We should only be storing actual UTF8 in the database (with the exception of the Famous Five that need to be escaped in XML).

tags: added: didyoumean
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.