Matches exactly search retrieval issue for 2.5.1

Bug #1267129 reported by Elaine Hardy
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Evergreen
Confirmed
Undecided
Unassigned

Bug Description

We’ve noticed what appears to be incorrect returns in a matches exactly title search in 2.5.1.

In the PINES database test server running 2.5.1, a title contains search for joy of cooking retrieves 56 titles, 36 of which contain the phrase “joy of cooking” in the 245 (main title) field. They include:

Joy of cooking
All new all purpose joy of cooking
Joy of cooking. |p All about party foods & drinks
Joy of cooking. |p All about pies & tarts
Joy of cooking. |p All about canning & preserving
The joy of cooking : |b a compilation of reliable recipes with a casual culinary chat
The joy of cooking Christmas cookies
Stand facing the stove : the story of the women who gave America the Joy of cooking

A matches exactly title search for joy of cooking only retrieves 17 titles. The search results showing just titles for illustration are:

1. Joy of cooking
2. Joy of cooking
3. Joy of cooking
4. Joy of cooking
5. Joy of cooking
6. Joy of cooking
7. Joy of cooking
8. Joy of cooking
9. Joy of cooking.
10. Joy of cooking
11. Joy of cooking
12. Joy of Cooking.
13. Joy of cooking
14. Joy of cooking
15. Joy of cooking.
16. Joy of cooking. |p All about vegetarian cooking
17. JOY OF COOKING.

As you can see, with the one exception, the search is only retrieving one record where the title contains subfield p, however there are a number of titles in the contains search with |a Joy of cooking. and a |p.

Also, with that one exception, no title containing the phrase “joy of cooking” in either the 245 |a or |b along with other words or phrases is retrieved.

We did reindex Joy of cooking. |p All about pies & tarts to see if that was the issue, but it was not retrieved in a subsequent search.

Kathy Lussier reports:

Thomas Berezansky tracked down the problem in IRC. The "Matches Exactly" option is changing the search to:
^joy of cooking$

when it should be changing it to:
"^joy of cooking$"

The current behavior says the title should start with "joy", contain "of", and end with "cooking." Most of the Joy of Cooking titles did not end with cooking, so they weren't retrieved, but the vegetarian cooking one does indeed end with cooking.

J. Elaine Hardy
PINES & Collaborative Projects Manager
Georgia Public Library Service
1800 Century Place, Ste 150
Atlanta, Ga. 30345-4304

404.235-7128
404.235-7201, fax
<email address hidden>

Revision history for this message
Elaine Hardy (ehardy) wrote :

This bug has wider implications than I first thought. When searching a title with or without an initial article, results differ. For example, PINES title contains search for “restaurant at the end of the universe” returns 20 titles, 16 with that phrase in the 245. Matches exactly for “restaurant at the end of the universe” returns 4 hits – 2 are compilations and 2 don’t have the initial article. Matches exactly title search for “the restaurant at the end of the universe” returns 14 hits, all with the phrase in the 245.

Elaine Hardy

Revision history for this message
Kathy Lussier (klussier) wrote :

Hi Elaine,

I think this issue is different than the one you intially reported on the bug. My expectation for a "Matches Exactly" search is that the initial article would indeed affect the search results since initial articles are not ignored in Evergreen keyword searches. As evidenced by the initial bug report, "Matches Exactly" searches aren't working correctly,but, when they do work correctly, my understanding is that you need to enter the search terms exactly as they appear in the title from beginning to end, unlike the "contains phrase" search, which just looks for phrase in the title.

To my knowledge, the browse search is the only search that has been taught to ignore initial articles.

Kathy

Revision history for this message
Elaine Hardy (ehardy) wrote : RE: [Bug 1267129] Re: Matches exactly search retrieval issue for 2.5.1
Download full text (5.0 KiB)

At one time, matches exactly did not require the search phrase to be at the
beginning of a field. The order of the words within the search phrase was
important, but not the placement within the field. Matches exactly allowed
you to find the exact phrase, in order without stemming, where ever it
appears in the indexed field. I was unaware they had made that change.

I do forget that they have also changed how Evergreen handles articles, even
though they did that some time ago. At one time, Evergreen did ignore
initial articles based on the filing indicators (at least, that is what I
requested, that the developers said they did, and what early searching
indicated they did). That way, users could find "a is for apple" (which was
an issue with our previous system) and instances where a was an initial
article. That also meant the search "the autobiography of Mark Twain" turned
up the same titles as "autobiography of Mark Twain".

Elaine

J. Elaine Hardy
PINES & Collaborative Projects Manager
Georgia Public Library Service
1800 Century Place, Ste 150
Atlanta, Ga. 30345-4304

404.235-7128
404.235-7201, fax
<email address hidden>
www.georgialibraries.org
www.georgialibraries.org/pines

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of
Kathy Lussier
Sent: Monday, January 27, 2014 4:36 PM
To: <email address hidden>
Subject: [Bug 1267129] Re: Matches exactly search retrieval issue for 2.5.1

Hi Elaine,

I think this issue is different than the one you intially reported on the
bug. My expectation for a "Matches Exactly" search is that the initial
article would indeed affect the search results since initial articles are
not ignored in Evergreen keyword searches. As evidenced by the initial bug
report, "Matches Exactly" searches aren't working correctly,but, when they
do work correctly, my understanding is that you need to enter the search
terms exactly as they appear in the title from beginning to end, unlike the
"contains phrase" search, which just looks for phrase in the title.

To my knowledge, the browse search is the only search that has been taught
to ignore initial articles.

Kathy

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1267129

Title:
  Matches exactly search retrieval issue for 2.5.1

Status in Evergreen - Open ILS:
  New

Bug description:
  We’ve noticed what appears to be incorrect returns in a matches
  exactly title search in 2.5.1.

  In the PINES database test server running 2.5.1, a title contains
  search for joy of cooking retrieves 56 titles, 36 of which contain the
  phrase “joy of cooking” in the 245 (main title) field. They include:

  Joy of cooking
  All new all purpose joy of cooking
  Joy of cooking. |p All about party foods & drinks
  Joy of cooking. |p All about pies & tarts
  Joy of cooking. |p All about canning & preserving
  The joy of cooking : |b a compilation of reliable recipes with a casual
culinary chat
  The joy of cooking Christmas cookies
  Stand facing the stove : the story of the women who gave America the Joy
of cooking

  A matches exactly title search ...

Read more...

Revision history for this message
Dan Scott (denials) wrote :

One request: to be absolutely clear on what's happening, we need to be able to tell when quotes are being used in example searches just to show what the search terms are, versus when the quotes are actually entered in the search field.

For example, I would expect different results if given:

"the autobiography of Mark Twain" <-- with quotes in the search field

vs.

the autobiography of Mark Twain <-- no quotes in the search field

It's always really helpful to have an example MARC record that we can use for checking these operations. Here's the 245 from one that is in the concerto test data set that has non-filing indicators:

   <datafield tag="245" ind1="1" ind2="4">
     <subfield code="a">The father hunt;</subfield>
     <subfield code="b">a Nero Wolfe novel.</subfield>
   </datafield>

Here are the results of the following searches (all searches shown exactly as entered in the basic search box against default "Title" field):

"^the father hunt; a nero wolfe novel$" # 1 result
^the father hunt; a nero wolfe novel$ # 1 result
^the father hunt;$ # 0 results
^the father hunt$ # 0 results
^a nero wolfe novel$ # 0 results
the father hunt # 1 result
"the father hunt" # 1 result
"the hunt a wolfe" # 0 results
the hunt a wolfe # 1 result

All of these match my expectations, with the _possible_ exception of the anchored search requiring both 245 $a + 245 $b.

Could you look at these and let us know what results did not match your expectations, based on the incoming 245 field and the search queries?

Changed in evergreen:
status: New → Incomplete
Revision history for this message
Kathy Lussier (klussier) wrote :

Looking at the example searches Dan posted above, they all seem to be matching my expectations of how the system, as it exists today, should be returning results.

However, I do think it would be worthwhile to consider the idea of anchored title searches only retrieving results based on 245a. We've had this discussion somewhat recently at MassLNC, and it does seem like, in most cases, users are unaware of the subtitle, and they may have more success with the search if the system were just looking at the 245a.

However, let's say we did this with the above examples. Would the user then need to enter "^the father hunt;$" (with the semicolon at the end) to successfully retrieve search results? If so, I think the user would continue to struggle with these searches because nobody is going to think to add that punctuation.

Overall, I think we have two separate issues here:

1. As Thomas discovered in the earlier discussion, the "Matches Exactly" option on the advanced search page should probably add quotation marks to the search so that a search for the father hunt is transformed to "^the father hunt$" instead of ^the father hunt$.

2. We might want to consider a change so that anchored title searches are only searching on 245a instead of the entire title.

Since both issues led to the results Elaine found with the Joy of Cooking example, I don't know which one belongs on this bug report and which one should be addressed through a new bug report.

Changed in evergreen:
status: Incomplete → Triaged
status: Triaged → Confirmed
Revision history for this message
Kathy Lussier (klussier) wrote :

Just adding a note about issue 1 above. If we were to surround add quotation marks to the search as I believe the option was originally intended to work, then the resulting search is much stricter about punctuation.

Currently, the Matches Exactly option used for a search entered as:

the father hunt: a nero wolfe novel

will turn the search into:

^the father hunt: a nero wolf novel$

I successfully retrieve the record even though I used a colon instead of a semicolon.

If we were to change the Matches Exactly option so that it turns the search into:

"^the father hunt; a nero wolfe novel$"

then the user would not retrieve the record because they used the wrong punctuation.

Given this information, I'm almost inclined to keep "Matches Exactly" as it is so that we can be more forgiving about punctuation.

On the other hand, by keeping it this way, we find that a "Matches Exactly" search for the help in the MVLC catalog, as an example, will successfully retrieve "The Berenstain Bears hurry to help" because the first word of the title is "the" and the last word is "help."

http://catalog.mvlc.org/eg/opac/results?bool=and;bool=and;bool=and;qtype=title;qtype=title;qtype=author;contains=exact;contains=contains;contains=contains;query=the%20help;query=;query=;_adv=1;locg=1;pubdate=is;page=1

In either case, we have examples where the results are likely to frustrate users. We have two consortia that have removed this search option from their advanced search screen because it ultimately is not as helpful as it might appear to be on the surface. I'm inclined to think it should be removed from Evergreen as a whole.

Revision history for this message
Elaine Hardy (ehardy) wrote :

This is a very frustrating search for users. An example, timely, search from the PINES catalog -- Gone girl by Gillian Flynn.

A basic title search gone girl retrieves 208 records

An advanced search title: gone girl and author Flynn, returns 7 titles, including two with the subtitle a novel and the Spanish language Perdida.

A basic title search of "gone girl" retrieves 9 titles -- the 7 Flynn titles and 2 anthologies. Both anthologies include the short story Gone girl by Ross Macdonald.

An advanced title search of gone girl matches exactly retrieves 3 titles. They are all the Flynn book but exclude those with the subtitle a novel and the Spanish language version. Which means it excludes the primary title record for the novel.

In my opinion, to best serve users, the matches exactly search using the drop down menu should retrieve the same result set as a title search using quotes -- "gone girl" A patron may want that Ross MacDonald short story and, definitely, the primary title record for the novel Gone girl by Flynn. If it can't do that, then, yes, I think it should be removed.

Revision history for this message
Kathy Lussier (klussier) wrote :

Elaine,

I think it would be very easy to make the "Matches Exactly" search to just do a title search (or keyword or author) using quotes. But, then, aren't we just replicating the "Contains Phrase" search?

Kathy

Revision history for this message
Elaine Hardy (ehardy) wrote :
Download full text (4.3 KiB)

Yes, although I can see why the intent of the search would be different, I
think practically users would want the same results. I think particularly
if they are searching for a song or short story title, they are going to
expect to find it with a title matches exactly search if they have the exact
title. Certainly, they would expect gone girl to retrieve gone girl a novel.

I think given the diversity in how both a title is expressed on a title page
and thus in the 245 and how a user might search (even with the punctuation
that you illustrated), matches exactly is too restrictive. Unless someone
has a good argument to keep matches exactly, I agree to remove it.

Elaine

J. Elaine Hardy
PINES & Collaborative Projects Manager
Georgia Public Library Service
1800 Century Place, Ste 150
Atlanta, Ga. 30345-4304

404.235-7128
404.235-7201, fax
<email address hidden>
www.georgialibraries.org
www.georgialibraries.org/pines

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of
Kathy Lussier
Sent: Monday, September 29, 2014 1:21 PM
To: <email address hidden>
Subject: [Bug 1267129] Re: Matches exactly search retrieval issue for 2.5.1

Elaine,

I think it would be very easy to make the "Matches Exactly" search to just
do a title search (or keyword or author) using quotes. But, then, aren't we
just replicating the "Contains Phrase" search?

Kathy

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1267129

Title:
  Matches exactly search retrieval issue for 2.5.1

Status in Evergreen - Open ILS:
  Confirmed

Bug description:
  We’ve noticed what appears to be incorrect returns in a matches
  exactly title search in 2.5.1.

  In the PINES database test server running 2.5.1, a title contains
  search for joy of cooking retrieves 56 titles, 36 of which contain the
  phrase “joy of cooking” in the 245 (main title) field. They include:

  Joy of cooking
  All new all purpose joy of cooking
  Joy of cooking. |p All about party foods & drinks
  Joy of cooking. |p All about pies & tarts
  Joy of cooking. |p All about canning & preserving
  The joy of cooking : |b a compilation of reliable recipes with a casual
culinary chat
  The joy of cooking Christmas cookies
  Stand facing the stove : the story of the women who gave America the Joy
of cooking

  A matches exactly title search for joy of cooking only retrieves 17
titles. The search results showing just titles for illustration are:

  1. Joy of cooking
  2. Joy of cooking
  3. Joy of cooking
  4. Joy of cooking
  5. Joy of cooking
  6. Joy of cooking
  7. Joy of cooking
  8. Joy of cooking
  9. Joy of cooking.
  10. Joy of cooking
  11. Joy of cooking
  12. Joy of Cooking.
  13. Joy of cooking
  14. Joy of cooking
  15. Joy of cooking.
  16. Joy of cooking. |p All about vegetarian cooking
  17. JOY OF COOKING.

  As you can see, with the one exception, the search is only retrieving
  one record where the title contains subfie...

Read more...

Revision history for this message
Andrea Neiman (aneiman) wrote :

Updating tags to match our controlled vocab but given the significant search improvements since 2014, I would be interested to know if this is still happening or if it can be a wontfix.

tags: added: search
removed: searching
Revision history for this message
Elaine Hardy (ehardy) wrote :

The initial problem illustrated with the Joy of cooking search is still occurring in 3.2.3.

The issue in #7 where title search matches exactly does not include gone girl |b a novel is still occurring

From #6 -- title matches exactly father hunt does not retrieve father hunt |b a nero wolfe novel or The father hunt

I think the underlying discussion of what expectations are for a matches exactly search and the differences from a contains phrase search are still there.

Revision history for this message
Andrea Neiman (aneiman) wrote :

Thanks for confirming, Elaine.

tags: added: needsdiscussion
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.