marc_export utility allows the creation of invalid (too large) MARC records

Bug #1397532 reported by Chris Sharp
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Triaged
Undecided
Unassigned

Bug Description

When doing a bibliographic record export with holdings, the marc_export utility allows the creation of records that are too large for the USMARC specification of 99,999 bytes. The MARC Perl libraries emit warnings at runtime that this is happening, but it goes ahead and creates the records anyway. A side effect of this is that the leader, which only contains 5 character positions for record length, also contains invalid data, since it enters the first five characters of the actual record length over 99,999. Example error output from MARC::Lint:

 Invalid record length in record 299: Leader says 10732 bytes but it's actually 107321

As you can see, the actual length of 107321 is truncated to "10732". This causes any MARC processing utility to choke outright, and identifying the offending record is... difficult.

My suggestion is that the marc_export script be altered so that:

1) there is more useful debugging information available (the biblio.record.id of the currently processed record would suffice). Perhaps a "--debug" option could be added to the script?

additionally, or instead:

2) there is some mechanism for checking the size (length) of a record, and if it exceeds the MARC length limit, the script does not include it in the export file, but logs the record ID and any errors into an exceptions file.

Evergreen 2.5.1+
OpenSRF 2.2
PostgreSQL 9.3
Ubuntu 12.04/14.04

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I'm tempted to set this to "Won't fix," and add a comment along the lines of "MARC is a broken format. Don't use it."

However, I think this is more of an issue with MARC::Record and friends, since MARC::Record sets the size in the leader. I also think you should check what version of MARC::Record you have installed. I recall seeing code in a recent version that should handle oversize records by setting the size to 99999.

FWIW, I've only ever seen oversized records when exporting holdings, usually for the whole consortium. Most of our vendors have workarounds for this by ignoring the size field and reading to the next record separator. Really, any decent software should ignore the size field since it is wholly unnecessary.

You could try exporting records with holdings in separate batches for each member library. That should only take you and your vendor until Doomsday to output and to parse.

And, its practically 2015. Can we have a decent bibliographic record format already?

Changed in evergreen:
status: New → Triaged
Revision history for this message
Jason Stephenson (jstephenson) wrote :

Oh, I should add:

1. I'm not opposed to adding a --debug option. Question is: what would it output?

2. I think if you want to reject oversize records that should also be added via a --strict option or some such. It might also benefit from changes to MARC::Record. It has been a few months since I last looked at the latter, so I don't remember off the top of my head if it can throw errors on oversized records.

Revision history for this message
Chris Sharp (chrissharp123) wrote :

> I also think you
> should check what version of MARC::Record you have installed. I recall
> seeing code in a recent version that should handle oversize records by
> setting the size to 99999.

I'm running 2.0.3, which is the packaged version for Ubuntu 12.04. 2.0.6 (the most recent version) is in the 14.04 repos.

> FWIW, I've only ever seen oversized records when exporting holdings,
> usually for the whole consortium.

Yes, that's what's triggering the issue. Since we have ~285 libraries adding holdings to the same bib records, this problem is more common with PINES than I would expect to see elsewhere.

> Most of our vendors have
> workarounds for this by ignoring the size field and reading to the next record
> separator. Really, any decent software should ignore the size field
> since it is wholly unnecessary.

Yes, at the moment, the vendor we're pushing our files to is working around the invalid records.

> 1. I'm not opposed to adding a --debug option. Question is: what
> would it output?

If possible, I think the bib record ID should be emitted along with any error/warning. That would be immensely helpful.

> 2. I think if you want to reject oversize records that should also be
> added via a --strict option or some such. It might also benefit from
> changes to MARC::Record. It has been a few months since I last looked
> at the latter, so I don't remember off the top of my head if it can
> throw errors on oversized records.

If it can, I like the idea of a --strict.

Thanks!

Chris

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I am unsure of the status of this bug. I'm still tempted to set it to won't fix. I don't think there is much that we can do about oversized records in Evergreen, except possibly split the holdings up into separate, legally-sized records on output. This would mean breaking different rules by having multiple records in the same collection with the same value in 001.

In short, I don't think there is a good solution to the oversized record problem, except that everyone agree to ignore the size field. (Good luck with that!)

In the meantime bug 1502152 proposes a change to at least output the record id of the oversized records on export. While that bug/branch is targeted at master, it should backport to as far back as 2.7 without conflicts.

tags: added: marc
Elaine Hardy (ehardy)
tags: added: cat-importexport cat-marc
removed: marc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.