marc_export: provide way to exclude OPAC-suppressed items

Bug #2015484 reported by Galen Charlton
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Wishlist
Unassigned

Bug Description

It would be handy to be able to ensure that marc_export only emits records and items that are visible in the OPAC. For example, this would allow Evergreen's OPAC visibility settings to be respected in a discovery layer that's fed from an Evergreen system.

Patch forthcoming to add such an option.

Galen Charlton (gmc)
Changed in evergreen:
importance: Undecided → Wishlist
Revision history for this message
Galen Charlton (gmc) wrote :

A pull request is available in the branch user/gmcharlt/lp2015484_exclude_opac_visible_from_export / https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/gmcharlt/lp2015484_exclude_opac_visible_from_export

tags: added: pullrequest
Revision history for this message
Galen Charlton (gmc) wrote :

I've pushed a follow-up to that branch to ensure that a bare '-e' continues to mean '--encoding'.

tags: added: cat-importexport
Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

Hello, I've been trying this out and I think some bibs are not being excluded.

I'm comparing the list of bibs created with this query

\pset format unaligned
SELECT DISTINCT bre.id
  FROM biblio.record_entry AS bre
  JOIN asset.call_number AS acn ON acn.record = bre.id and not acn.deleted
  join asset.copy_vis_attr_cache acvac on acvac.record=bre.id
         and acvac.vis_attr_vector @@ ( SELECT c_attrs::query_int FROM asset.patron_default_visibility_mask() LIMIT 1 )
  WHERE bre.deleted='false'
  and acn.owning_lib in (SELECT id FROM actor.org_unit_descendants(101)) --101 = LARL
  order by 1

and what is exported with the --exclude-hidden --items --descendants LARL arguments.

I'm seeing bibs being included when there is just one copy that has a status that should hidden from the opac. This caused 6346 bibs to be included in the export that are hidden from the opac in our case, out of 125738 total in the marc_export data file.

Josh

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

This statement is made

"If a bib therefore ends up with no
+visible items, it will be excluded from the output."

But I'm not sure I see where this is implemented at. In the next() sub, it doesn't look like there is a check to see if no valid items were found and then to skip the bib.
https://git.evergreen-ils.org/?p=working/Evergreen.git;a=blob;f=Open-ILS/src/support-scripts/marc_export.in;h=748cc7f4b8a8bab38db63773d6f7e221b1879a5e;hb=d1a6702e280a8cef9924b106ad44d587f18356f9#l639

Josh

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

I added a check to skip bibs with no items if the --exclude-hidden is enabled.

It shows me that I don't really understand the copy and bib visibility stuff yet. My query above doesn't give the same results, because the acn checks are ignoring the copy status, and the default patron visibility mask is at the consortium level. But the export looks correct when I spot check it.

I'm thinking that this will mess up the --uris option though, I think this will filter those out since they have no copies attached. So first of all is that a problem that needs to be fixed. Work was done to make --uris and --items work together, but do --uris + --items + --exclude-hidden need to work together?

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

Here is a working branch that adds in the check and handles --uris correctly.

If --exclude-hidden + --items + --uris are used, there is an extra check to see if an empty bib has at least one located URI and if it does it doesn't get excluded.

I haven't tested the --since behavior, that seems to include deleted items and call numbers, so I set deleted located URI's to be included also in the test I added.

Working branch at: user/stompro/lp2015484_exclude_opac_visible_from_export_work1

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/stompro/lp2015484_exclude_opac_visible_from_export_work1

Josh

Changed in evergreen:
assignee: nobody → Jason Stephenson (jstephenson)
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I have been looking at this, and I see that Josh's changes work when the --items option is used.

I've experimented a bit to exclude MARC records without opac visible items even when the --items option isn't used. It seems to work, but I think it needs to take into account records with located (or opac visible) URIs.

If I'm off base, let me know before I spend too much time on it.

Seeing this code again, makes me want to refactor the bulk of it into a reusable Perl module and then just have marc_export be a command line wrapper around that library. Oh well, that's a different wishlist bug.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I have signed off on Galen's and Josh's commits and pushed to a collab branch with an additional commit to make the --exclude-hidden option work even when the --items option is not used. (I thought this would be a useful addition.) As always, more testing is requested since I may have missed something, particularly with the changes that I made.

collab/dyrcona/lp2015484_exclude_opac_visible_from_export

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/dyrcona/lp2015484_exclude_opac_visible_from_export

Changed in evergreen:
milestone: none → 3.12-beta
status: New → Confirmed
assignee: Jason Stephenson (jstephenson) → nobody
Revision history for this message
Jane Sandberg (sandbergja) wrote (last edit ):

Thanks, Galen, Josh, and Jason. I think it's looking great!

As a follow-up to Jason's work: I noticed that running --exclude-hidden without --items works as expected in cases where all items on a bib have opac_visible=false. However, in cases where items have opac_visible=true but their shelving locations have opac_visible=false, the records are still exported. The same is true for records that have visible items with invisible statuses (e.g. all items on a record have status=Bindery).

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Thanks for the feedback, Jane!

I'll take a look.

Changed in evergreen:
assignee: nobody → Jason Stephenson (jstephenson)
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I am no longer interested in Perl marc_export. I am instead focusing my attention here: https://github.com/kcls/evergreen-universe-rs/blob/main/evergreen/src/bin/marc-export.rs.

I hereby retract my working/signoff branch.

Changed in evergreen:
assignee: Jason Stephenson (jstephenson) → nobody
Revision history for this message
Jason Stephenson (jstephenson) wrote (last edit ):

Despite my previous statement, we have been "using" this branch in production at CW MARS, and have found issues with it.

With this code applied, the --descendants option sometimes pulls in too many records, i.e. it includes visible records from libraries others than those that are descendants of the specified org units. It does not seem to always do this for all org. units, but we have at least two in our database where it consistently happens.

We have reverted this patch, and I will see if that resolves the issue for us.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I removed this branch and redid one of the exports. I got the expected number of records (c. 140,000) instead of all the records with visible copies in the database (c. 1.7 million).

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I should add that I was not using the --exclude-hidden option when doing these exports. It looks like this branch has an effect even when the option is not used.

I should further add that I tried this just Galen's original commits and saw the same effect.

tags: added: needswork
removed: pullrequest
Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

I just looked through Galen's patch to see what might have changed when using the --descendant option.

I wonder if the problem is with this line that was removed.
my @ids = map {$_->[0]} @{$r};

and changed to this when --exclude-hidden isn't used.
@ids = map {$_->[0]} grep @{$r};

- my @ids = map {$_->[0]} @{$r};
+ my @ids = ();
+ if ($Marque::config->option_value('exclude-hidden')) {
+ @ids = map {$_->[0]} grep {$_->[1]} @{$r};
+ } else {
+ @ids = map {$_->[0]} grep @{$r};
+ }

Removing that errant grep does seem to fix it for me.

Before change -
marc_export --descendants ADA > adatest.marc
Resulted in 223718 records exported.

After change
marc_export --descendants ADA > adatest2.marc
resulted in 8311 records exported.

Which is much closer to reasonable for a small branch like ADA.

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

The 8311 count matches what I think is a corresponding query for that location.

SELECT DISTINCT bre.id FROM biblio.record_entry bre

JOIN asset.call_number acn ON acn.record=bre.id AND NOT acn.deleted
JOIN actor.org_unit aou ON aou.id=acn.owning_lib AND aou.shortname='ADA'
WHERE
NOT bre.deleted;

(Since we use floating, that number has no relation to reality though :-) )

Changed in evergreen:
assignee: nobody → Jason Stephenson (jstephenson)
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I plan to do a few things with these branches:

1. Pick one to work on. I will likely start with Josh Stompro's "signoff" branch.
2. Rebase the chosen branch on working/collab/dyrcona/lp2041364-marc_export_improvements-partial-signoff from bug 2041364.
3. Fix the issue with the --descendants option as suggested in comment #15.

I think we should have some discussion about this feature:

1. Should the parameter be --exclude-hidden or --limit-to-opac-visible or something else?*
2. Should it only apply when the --items option is used?

* I'd prefer a more precise and more verbose parameter flag. I'd prefer --limit-to-opac-visible over --exclude-hidden, but I can be convinced to stick with the current parameter or something else.

Revision history for this message
Jason Stephenson (jstephenson) wrote (last edit ):

Here's a link to the rebased branch as promised: https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/dyrcona/lp2015484_exclude_opac_visible_from_export-rebase

I've tested this with a couple of our regular exports and it gives me results similar to what I expect from the production code.

I'm still testing it, but I think we'll want to use the rebase branch if the code from bug 2041364 got in first.

In fact, that's my preferred order for merging the code.

Edited to add that I'm also "testing" this in production. I think the above is working rather well.

Changed in evergreen:
assignee: Jason Stephenson (jstephenson) → nobody
tags: added: pullrequest
removed: needswork
Revision history for this message
Jason Stephenson (jstephenson) wrote :

Here is where we stand with the bug:

The branch from comment #18 is the one that I'm requesting a pullrequest for. It includes the pullrequest branch from bug 2041364, since that is an actual bug and the original branch this feature had some conflicts with the other. I have resolved those in the rebase branch.

I have signed off on others' commits in both branches, but no one has signed off on mine. I recommend testing with this branch applied and signing off here if you're happy with the combined code. We can then push this branch to main, and the other to rel_3_10 and rel_3_11. If you want to be pedantic, you can test the other branch exclusively on either of those releases. (I have tested the combined branch on rel_3_10_3 as well as rel_3_7_4 and main.)

A note about the new feature branch: I have removed the commit that Jane mentioned not working. I decided not to modify the --exclude-hidden option. It only matters if one of the --items, --library, or --descendants flags are used for marc_export.

Testing is relatively simple and yet complex. You should do a couple of exports before applying this code to see what you get. Then, you should apply this code and do some exports with and without the --exclude-hidden option. You should see a difference in numbers of records if you have locations with bibs and/or items that are NOT opac_visible.

It may help testing if you apply the branch from bug 2041364 first and do some exports with that branch alone because of the speed up that it offers.

Also, I'm not too fussed about the option being --exclude-hidden. I didn't change that in the rebase branch, but if anyone thinks it ought to be changed, feel free to chime in here.

And, FWIW, we're using this code in production at C/W MARS as of 2023-11-01.

Changed in evergreen:
assignee: nobody → Jane Sandberg (sandbergja)
Revision history for this message
Jane Sandberg (sandbergja) wrote :

Thanks for the great work on this, Galen, Jason, and Josh! I cherry-picked collab/dyrcona/lp2015484_exclude_opac_visible_from_export-rebase to main for inclusion in 3.12.

tags: added: signedoff
Changed in evergreen:
assignee: Jane Sandberg (sandbergja) → nobody
status: Confirmed → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.