evergreen.maintain_901() can mangle its output

Bug #1028514 reported by Galen Charlton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Critical
Unassigned
2.1
Fix Released
Undecided
Unassigned
2.2
Fix Released
Critical
Unassigned

Bug Description

If a MARCXML record has some elements that use a namespace prefix and some do that do not, the regexp used to insert the 901 can insert it in the wrong place.

For example, consider:

<marc:record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/MARC21/slim" xmlns:marc="http://www.loc.gov/MARC21/slim" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/ standards/marcxml/schema/MARC21slim.xsd">
<leader>01234cam a22003014a 4500</leader>
<controlfield tag="001">216938607</controlfield>
<controlfield tag="003">OCoLC</controlfield>
<datafield tag="521" ind1=" " ind2=" ">
  <subfield code="a">RL: 3 ; ages 7-12.</subfield>
</datafield>
<datafield tag="800" ind1="1" ind2=" ">
  <subfield code="a">Warner, Gertrude Chandler,</subfield>
  <subfield code="d">1890-1979.</subfield>
  <subfield code="t">Boxcar children mysteries.</subfield>
</datafield>
</marc:record>

maintain_901() would munge the record to something like this:

<marc:record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/MARC21/slim" xmlns:marc="http://www.loc.gov/MARC21/slim" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/ standards/marcxml/schema/MARC21slim.xsd">
<leader>01234cam a22003014a 4500</leader>
<controlfield tag="001">216938607</controlfield>
<controlfield tag="003">OCoLC</controlfield>
<datafield tag="521" ind1=" " ind2=" ">
  <subfield code="a">RL: 3 ; ages 7-12.<datafield tag="901" ind1=" " ind2=" "><subfield code="a">216938607</subfield><subfield code="b">OCoLC</subfield><subfield code="c">5322176</subfield><subfield code="t">biblio</subfield></datafield></subfield>
</datafield>
<datafield tag="800" ind1="1" ind2=" ">
  <subfield code="a">Warner, Gertrude Chandler,</subfield>
  <subfield code="d">1890-1979.</subfield>
  <subfield code="t">Boxcar children mysteries.</subfield>
</datafield>
</marc:record>

That's not a valid MARCXML record. If one is attempting to reingest that record to add the 901 field (see bug 798702 for why one would do that), you can end up with a misleading error message. In particular, running

update biblio.record_entry set id = id where id = 5322176;

for such a record can get you an error like this:

psql:x.sql:2: ERROR: Tag "" is not a valid tag. at /usr/share/perl5/MARC/File/SAX.pm line 92
CONTEXT: PL/Perl function "maintain_control_numbers"

Evergreen: master

Tags: pullrequest
Galen Charlton (gmc)
Changed in evergreen:
milestone: none → 2.3.0-beta1
importance: Undecided → Medium
Revision history for this message
Galen Charlton (gmc) wrote :

Fix is available in the working/Evergreen repository at the tip of branch user/gmcharlt/lp1028514_fix_maintain_901 . Recommend push to master and rel_2_2.

tags: added: pullrequest
Revision history for this message
Dan Scott (denials) wrote :

We've been bitten by the deadly combination of regular expressions & XML several times now. Maybe we should just bite the bullet and implement a plperlu function based on MARC::File::XML instead of evolving the regex in maintain_901()?

Revision history for this message
Dan Scott (denials) wrote :

So, I offer as an alternative user/dbs/maintain_901_via_plperlu in working.

Revision history for this message
Mike Rylander (mrylander) wrote :

I've signed off at user/miker/maintain_901_via_plperlu_signoff with my thanks.

Revision history for this message
Galen Charlton (gmc) wrote :

Performed additional testing (and found another edge cast justifying rewriting the procedure in plperlu). Pushed to master.

Changed in evergreen:
status: New → Fix Committed
Revision history for this message
Galen Charlton (gmc) wrote :

And pushed to rel_2_2 as well

Revision history for this message
Galen Charlton (gmc) wrote :

Dan Scott discovered an issue with the plperlu patch, working on a fix now.

Revision history for this message
Galen Charlton (gmc) wrote :

Fixup pushed to working/Evergreen.git - user/gmcharlt/lp1028514_redux

Revision history for this message
Dan Scott (denials) wrote :

I pointed out in IRC yesterday that there was "one more thing" with the plperlu version - append_fields() does not accept the same sort of arguments that add_fields() accepts, and thus the authority record use case was broken.

I posted a branch yesterday to fix this issue, too (http://evergreen-ils.org/irc_logs/evergreen/2012-07/%23evergreen.27-Fri-2012.log#line223); Thomas said that the "Create Immediately" authority use case was working for him, but it would be good to have someone else test & commit this.

Raising severity to critical, as authority work is broken in all branches.

See user/dbs/unscrewup_maintain901 in working.

Changed in evergreen:
status: Fix Committed → Confirmed
importance: Medium → Critical
Revision history for this message
Dan Scott (denials) wrote :

Thanks to Thomas, who pushed this latest fix to master & rel_2_2. I'm going to push it to rel_2_1 as the same bugs exist in maintain_901 there.

Changed in evergreen:
status: Confirmed → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.