marc_export with --items is too damned slow and other things

Bug #1223903 reported by Jason Stephenson
42
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Wishlist
Unassigned

Bug Description

EG Version: master as of 20130721
OpenSRF Version: master as of 20130721
PostgreSQL Version: 9.1.9

The marc_export script that comes in the support_scripts directory is just too slow to export anything of any size. I have been requested to dump our full catalog with holdings information on a regular basis. I have my own server set up that can communicate with our production environment to do these sorts of things. You might consider it a utility server, but it isn't really.

Anyway, I started this command two days ago:

10074 pts/1 S+ 0:00 sh -c /openils/bin/marc_export --items >/tmp/topsfield.mrc 2>/dev/null

It is still running and so far, has produced no output:

-rw-rw-r-- 1 jason jason 0 Sep 9 14:09 topsfield.mrc

I have written other export programs for Backstage, etc., that can export records in a matter of minutes to hours instead of days.

http://git.mvlcstaff.org/?p=jason/backstage.git;a=summary

I thought I'd use marc_export on this one, but decided that it needs a rewrite which is my intention with filing this bug.

Along the way, I intend to also address the following bugs:

https://bugs.launchpad.net/evergreen/+bug/1217875

https://bugs.launchpad.net/evergreen/+bug/1046535

https://bugs.launchpad.net/evergreen/+bug/1075573

https://bugs.launchpad.net/evergreen/+bug/1182253

In the cases of the two that have branches already, I will merge those branches into the code.

However, one thought that I have had is to just use the Perl DBI for this. My experience shows that extracting records like this is much faster when done through the DBI layer and not through JSON query calls in CStore. Such a switch might render the above branches obsolete.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Much of what I said above is BS.

First, I figured out why marc_export produced no output: It was waiting on a list of bib ids on STDIN. However, when I used the --all option it ran for over 48 hours on my development vm before I stopped it, and it had only output about 1/4 of our bibs with --items specified. It seems the current program is indeed too slow.

Second, I am completely reimplementing MARC export in Evergreen, so while I will address the other listed bugs, I will not be merging anyone else's code or even referencing it.

I started working on something during the hackaway, and with some Fieldmapper modifications, I actually got it to work today. However, I'm unsatisfied with my present implementation and will start it over.

This time, I'll add a collection of Utility modules for FastExport. Looks like I'm going to put them under OpenILS::Utils. The new marc_export script will use these modules.

The reason for using modules is to make the code reusable in situations other than just a simple command line export script. For instance, I might replace some of the export code I've written in my other custom programs with these modules.

Also, the functionality could be more easily expanded with modules. For instance, modules could be added to compress output and/or upload the files to another server or directory somewhere. These are common tasks done after exporting MARC records. There is no reason that these cannot also be automated.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

And, we have a branch where I've started the new work with the Fieldmapper modifications:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dyrcona/lp1223903-MARC-export-Mk.2

Since I was working on a modular approach with all of the modules in one file, I'll likely take the code I've already written, split it up, and modify it into something better than it is. That code has its own branch in working (user/dyrcona/marque-export), but don't use it for anything serious. It is not under active development.

Ben Shum (bshum)
Changed in evergreen:
milestone: none → 2.next
importance: Undecided → Wishlist
status: New → Confirmed
Dan Wells (dbw2)
Changed in evergreen:
milestone: 2.6.0-alpha1 → none
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I'm dusting this off and will do my best to get it done, soon. I'm going with my original approach 'cause there is no time to rewrite it.

Changed in evergreen:
milestone: none → 2.6.0-beta1
Revision history for this message
Jason Stephenson (jstephenson) wrote :

Made the final rebased against master push and added the release notes.

This one is ready for testing.

tags: added: pullrequest
Revision history for this message
Dan Wells (dbw2) wrote :

Have not reviewed the code, but am commenting early to ask if we should perhaps consider renaming this before it gets committed. I like the name, but once the novelty wears off, we are left with a name where anyone unfamiliar with the code cannot tell what it is, what it does, what it refers to, or (in certain cases) even how to pronounce it. The fact that it is very close to the word "marquee" only adds to the mental clutter, I think.

Anyway, just my practical opinion :)

Revision history for this message
Jason Stephenson (jstephenson) wrote :

We could always just name it marc_export. It would simplify the Makefile a bit.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I force pushed a rebase onto the above branch that takes into account Dan Wells' concern in comment #5 and my reply in comment #6. We lose the clever Marque.pm and instead the changes go directly into marc_export. It is cleaner this way, and we don't need an install-exec-hook in Makefile.am.

As far as I am concerned, this one is ready for testing.

Revision history for this message
Ben Shum (bshum) wrote :

So far as I can tell, the new marc_export (formerly Marque.pm, RIP) works well for my purposes and is SUPER FAST.

Assigning to myself to push tomorrow.

Changed in evergreen:
assignee: nobody → Ben Shum (bshum)
Revision history for this message
Ben Shum (bshum) wrote :

Also, note, reminder to look for all the other bugs listed in the description that are being superseded / solved by fixes in this new marc_export rewrite.

Revision history for this message
Ben Shum (bshum) wrote :

I added the LP numbers to the commit lines, one tiny typo fix in the help portion which I squashed into the other commits.

Pushed to master. Thanks Jason!

Changed in evergreen:
status: Confirmed → Fix Committed
assignee: Ben Shum (bshum) → nobody
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.