Title browse has display and sorting issues

Bug #1235497 reported by Dan Wells
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Undecided
Unassigned

Bug Description

The title browse currently displays all the data from our custom <titleNonfiling> tag in MODS32. Since the "title" portion isn't currently being run through "chopPunctuation", we get whatever punctuation happened to be in the MARC field (e.g. "Linux pocket guide /"). Unfortunately, if we do use chopPunctuation on the title, we then end up losing the punctuation needed to separate subtitles from titles (e.g. "vi editor : pocket reference" becomes "vi editor pocket reference"), which is also un-good.

There are a lot of paths to fixing this, but I think the simplest would be to further customize our already custom <titleNonfiling> field and have it pull double-duty. Basically, since we already got rid of <nonSort>, let's complete the process and dump <subTitle> as well, essentially treating the <title> as a preformatted display field. This would leave us with something very simple, like:

            <titleNonfiling>
                <title>
                    <xsl:call-template name="chopPunctuation">
                        <xsl:with-param name="chopString">
                            <xsl:call-template name="subfieldSelect">
                                <xsl:with-param name="codes">abfgk</xsl:with-param>
                            </xsl:call-template>
                        </xsl:with-param>
                    </xsl:call-template>
                </title>
                <xsl:call-template name="part"></xsl:call-template>
            </titleNonfiling>

I think we should have a broader discussion about how we are using/customizing the MODS32 stylesheet, but short term, does anyone see a problem with this? I couldn't find any code which referenced or used <subTitle> directly, so I believe this accomplishes what we need without side-effects.

I'd really like to get this into 2.5.0-rc, since it would involve a lot of reingesting to fix later.
Thoughts? Thanks!

Dan Wells (dbw2)
summary: - Browse display has trailing punctuation
+ Title browse has display and sorting issues
Revision history for this message
Dan Wells (dbw2) wrote :

Ok, I am hijacking my own bug, because the problem goes a little deeper than this. Since the title browse is currently being populated from <titleNonfiling>, and the browse_sort_xpath can only narrow the main metabib_field xpath used for display, the current title browse sorting is including the non-filing characters. I am pretty sure that isn't what we want, since that kinda defeats the purpose of non-filing characters.

What it comes down to is that we cannot use a single metabib_field entry for both search and browse of titles, since we are purposely adding the non-filing characters in the <titleNonfiling> node in a non-separable way, yet we need to separate them off for the browse sorting.

In short, I think we need to properly re-add a 'Title Proper (Browse)' configuration to the default metabib_field entries, and have it configured for browse alone, and turn off the browse settings on 'Title Proper'. Doing so removes the need to change <titleNonfiling> as outlined above, since we can go back to the stock MODS title field, but I think we might want to take the above steps anyway to help out anyone using <titleNonfiling> for display (including our future selves).

Thoughts?

Dan Wells (dbw2)
tags: added: 2.5-release-blocker
Revision history for this message
Kathy Lussier (klussier) wrote :

Thanks for giving this some attention Dan! I just wanted to provide some perspective on the original intent, which was to ignore the non-filing characters when sorting entries, but to display all of those non-filing characters. This sounds like the behavior you outlined above, so I'm giving a +1 to the behavior as described.

At one point in testing the code (perhaps when the additional title index was added?), the browse list was ignoring those initial articles in the sort while displaying them in the list.

Revision history for this message
Dan Wells (dbw2) wrote :

Thanks, Kathy, for the feedback. I am glad we are on the same page.

Okay, and... ugh! It looks like we are going to need another custom MODS title field to pull this off. In testing, we really need both subtitle punctuation and nonSort separation in a single field in order to make browse work optimally. titleNonfiling gives us the subtitle punctuation, and titleInfo gives us the nonSort tag, but nothing gives us both. So, I propose adding something like:

            <titleBrowse>
                <xsl:variable name="titleBrowseChop">
                    <xsl:call-template name="chopPunctuation">
                        <xsl:with-param name="chopString">
                            <xsl:call-template name="subfieldSelect">
                                <xsl:with-param name="codes">abfgk</xsl:with-param>
                            </xsl:call-template>
                        </xsl:with-param>
                    </xsl:call-template>
                </xsl:variable>
                <xsl:choose>
                    <xsl:when test="@ind2>0">
                        <nonSort>
                            <xsl:value-of select="substring($titleBrowseChop,1,@ind2)"/>
                        </nonSort>
                        <title>
                            <xsl:value-of select="substring($titleBrowseChop,@ind2+1)"/>
                        </title>
                    </xsl:when>
                    <xsl:otherwise>
                        <title>
                            <xsl:value-of select="$titleBrowseChop"/>
                        </title>
                    </xsl:otherwise>
                </xsl:choose>
                <xsl:call-template name="part"></xsl:call-template>
            </titleBrowse>

I'm going to go ahead and make a branch for all this, but if this seems like the wrong path, it isn't too late to speak up.

Revision history for this message
Dan Wells (dbw2) wrote :

Ok, my attempt to bring this all together is here:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/dbwells/lp1235497_better_mods_for_title_browse_etc

working/collab/dbwells/lp1235497_better_mods_for_title_browse_etc

Please note that this also includes the changes Pasi wrote for #1118245, as it made sense to keep the MODS changes together for a simpler upgrade process.

tags: added: pullrequest
Revision history for this message
Dan Wells (dbw2) wrote :

I should also have noted that the fix is in four commits (including Pasi's) for better separation of issues.

Revision history for this message
Dan Wells (dbw2) wrote :

Force pushed a rebased branch.

Revision history for this message
Liam Whalen (whalen-ld) wrote :

Rather than a custom MODS schema, could the code that populates the new metabib_field use the original MODS Title field and apply the necessary transformations at the time of an INSERT or UPDATE?

Revision history for this message
Lebbeous Fogle-Weekley (lebbeous) wrote :

To respond to Liam, I'm afraid we already have a custom MODS schema for this and other reasons, and the question is one of adding one more custom field.

The necessary information to chop off articles if non-filing indicators are in play doesn't otherwise appear in a MODS rendering of a bib record, so code would have to have both MODS output and the original MARCXML in hand to try something like that, which would be a significant reworking of things.

I am seeing the results I expect from this branch, and am adding my signoff. As I type, related discussion seems to be ongoing in #evergreen, so I'm not actually pushing to master yet, just here: working/collab/senator/lp1235497_better_mods_for_title_browse_etc_signoff

Revision history for this message
Dan Wells (dbw2) wrote :

Liam, thank you very much for the feedback. I do think something like that should be discussed for later versions, but it would require a significantly broader patch, so it is (IMHO) too late for that level of change for the 2.5 release. We have already made a pile of changes to the MODS, so this is (hopefully) one last nudge to get it out the door.

As I noted in the original bug description, we have a lot of layers where we might fix this, but I am of the opinion that fewest changes wins for 2.5. I'd also be happy to see some better solutions for 2.6 :)

Revision history for this message
Liam Whalen (whalen-ld) wrote :

If the community wants to continue with a custom schema. I would suggest renaming the schema and providing a definition that includes the extra fields as well as code to remove the custom fields to get back to the original MODS definition. As well, if any of the APIs are returning these modified MODS records currently, then apply the transformation before returning the data as MODS. Or offer a hook to get MODS data and one to get the data as it is in the custom schema.

I will help in either case (the retaining of the custom schema or moving the custom data out of MODS and into Perl code before an INSERT or UPDATE).

Revision history for this message
Dan Wells (dbw2) wrote :

Jason S. expressed some concerns about the hardcoded config.metabib_field IDs. More specifically, his test server had some conflicts due to bug #1097399. I have a new commit + upgrade script very close to complete which should address these concerns, and will push to the working branch first thing in the morning (I am quite tired, so I just want to give it fresh eyes in the morning before pushing it).

Revision history for this message
Dan Wells (dbw2) wrote :

Okay, added some more commits to the beginning of the branch to handle Jason's concerns, and force pushed an update including Lebbeous's signoffs. This was done with a little more haste than normal, so apologies in advance if there are goof-ups.

Revision history for this message
Dan Wells (dbw2) wrote :

Ben noted in IRC that these changes caused a lot of bloat in the search tables (metabib.title_field_entry, specifically). This surprised me, because I wouldn't expect a config with search_field = false to end up in that table.

It looks like this issue traces back to a change made in commit 6205d9b43d60. I am thinking we need to simply remove the 'OR ind_data.browse_field' part of that change. After all, if anyone wants the behavior described, I would expect search_field to be true as well (in other words, it isn't either/or).

Will push an update to the branch to address this, but in the meantime, more feedback is welcome!

Revision history for this message
Dan Wells (dbw2) wrote :

I think what I said above is incomplete, will push a possible branch shortly with one more change.

Revision history for this message
Mike Rylander (mrylander) wrote :

Dan,

Regarding comment #13, yes, the OR seems incorrect. It's really up to the admin to do the right thing, re autosuggest, with the search and browse flags. There's a case for browse-only now, with bib browse, and I can see autosuggest being enhanced to ignore rows that are linked to index defs that don't have search set to true. But that is a tomorrow problem, IMO.

In short, I'm in favor of reverting 6205d9b43d60.

--miker

Revision history for this message
Dan Wells (dbw2) wrote :

Pushed on two more commits to deal with the duplication. We are now up to eight commits.

But, the new fixes raise a new question. We currently add both a combined row and separate rows to metabib.xxxxxx_field_entry if a config.metabib_field entry has both search_field and browse_field set to true. Should we drop the combined row?

It looks simple to do, but I want some more eyes. For example, rather than inserting (as we do in recent releases):

  id | source | field | value
 756 | 17 | 5 | Concertos, trombone (1991) Meditation, trombone, orchestra Concertos, trombone (1976-77)
 755 | 17 | 5 | Concertos, trombone (1976-77)
 754 | 17 | 5 | Meditation, trombone, orchestra
 753 | 17 | 5 | Concertos, trombone (1991)

we would simply have:

  id | source | field | value |
 755 | 17 | 5 | Concertos, trombone (1976-77)
 754 | 17 | 5 | Meditation, trombone, orchestra
 753 | 17 | 5 | Concertos, trombone (1991)

That looks a lot better to me, and, again, I think would be simple to do. Thoughts?

Revision history for this message
Dan Wells (dbw2) wrote :

I realized the above is only 100% true when browse_xpath is NULL or empty. I am still sure we could be smarter here, but that is perhaps part of a bigger issue.

Revision history for this message
Dan Wells (dbw2) wrote :

After doing some additional simple testing, much of the search code assumes those combined rows will be there, so that last idea is out (for this release, at least). The branch as committed seems to be working correctly for me.

Revision history for this message
Dan Wells (dbw2) wrote :

Mike, thanks for the review. I am sorry I didn't see your comment before I pushed the latest changes, but I think the approach I took in the last two commits is a decent middle ground for keeping the intended effect of 6205d9b43d60 while also honoring the search_field flag. It gets rid of search rows whenever search_field is false, but still preserves Dan's desire for searches like:

title:^Meditation

to work (when testing with the Concerto data set).

I know it seems like browse fields getting into search was an accidental feature in the first place, but it is useful, so until we have a more comprehensive solution, I am hoping what I have done here will suffice.

Revision history for this message
Mike Rylander (mrylander) wrote :

Comment 18 is correct. Please don't avoid the combined row. That's needed across the board, and will continue to be.

Revision history for this message
Lebbeous Fogle-Weekley (lebbeous) wrote :

I ran through tests of Dan's branch starting with master and running the upgrade scripts and starting with a new DB, watching the sorting of bibs in browse go from wrong to right both ways. The function to re-number anything custom in cmf over id 30 seems sane to me, and searching title still works, even in French. The one trade-off is that in browse fields (autosuggest and what you see in the actual browse interface) we do now see l' armée instead of l'armée (space) but we don't see that in search results and searching works fine, so for now it's a reasonable trade off? I think so.

Sign-off force pushed to same place as before: working/collab/senator/lp1235497_better_mods_for_title_browse_etc_signoff

Thanks to Dan and Mike and Ben for going round on this until it's all shipshape!

Revision history for this message
Lebbeous Fogle-Weekley (lebbeous) wrote :

Oh, and Liam: new bug or mailing list topic about exporting "clean" MODS? We may be overlooking use-cases for that. But even before bib/auth browse we had some impurities in our stylesheet that would presumably be showing up already.

Revision history for this message
Dan Wells (dbw2) wrote :

Lebbeous, thank you very much for your review on this. It sounds like there is still a chance that Mike's going to find some time to look this over tomorrow, so I'll wait on that before pushing to master. The more eyes, the better!

Liam, I happen to think getting back to a "pure" MODS sheet, then adding a custom "eg_extracts.xsl" (or whatever), is something we should strongly consider, and I think is at least close to what you are saying. It was discussed briefly at the Hack-a-Way in 2012, but limited tuits have largely prevented any movement on that. That said, I do feel some more urgency on the matter, since we are diverging ever more, so by all means, start the discussion on the list as Lebbeous suggested, and I know at least one person will chime in :)

Revision history for this message
Dan Wells (dbw2) wrote :

Mike weighed in, and after some small tweaks and questions answered, agreed this was good enough for now. Pushed to master, with thanks to all who helped!

Changed in evergreen:
status: New → Fix Committed
Dan Wells (dbw2)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.