Events with large amount of data can crash action_trigger_runner.pl

Bug #1858471 reported by Jason Stephenson
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
New
Undecided
Unassigned

Bug Description

Events that generate a lot of data, such as grouped event on circulations, can cause the action_trigger_runner.pl to hang up and stop processing events. We have seen this mainly with auto-renewal notice events for patrons who have a large number of items out, where large is 50 or more. It looks like if the gathered data exceeds the ejabberd max_stanza_size, then the client disconnects from ejabberd. The events being processed end up stuck in the collected state.

Because the action_trigger_runner.pl becomes unresponsive this can have an impact on other events of the same granularity. With our non-granular, run pending runner, I've seen hundreds of other events, hold ready for pickup notifications, etc., also get stuck in the collected state.

The action_trigger_runner.pl also hangs around for hours before it disappears on its own. It is "running" but as far as I can tell it is not doing anything useful.

The following message appears in the syslog while processing auto-renewal notices for a patron with just 30 items out:

Jan 6 10:52:00 util2 open-ils.trigger: [WARN:5171:Client.pm:122:] Sending large message of 3037383 bytes to <email address hidden>/open-ils.trigger

The next message appears in /openils/var/log/open-ils.trigger_stderr.log:

Caught error from 'run' method: Exception: OpenSRF::EX::JabberDisconnected 2020-01-06T10:52:05 OpenSRF::Application /usr/local/share/perl/5.22.1/OpenSRF/Application.pm:240 JabberDisconnected Exception: This JabberClient instance is no longer connected to the server

The above are the most recent examples that I have. For other occurrences, the pattern is the same. The "Sending large message" appears in the syslog followed by the JabberDisconnected Exception 5 seconds later.

Adjusting max_stanza_size to a value above the size of the larg message, reloading ejabberd, restarting services, and setting the collected events back to pending state allows them to process.

It looks like one possible solution to this would be to complet the chunking and bundling implementation in open-ils.trigger.

tags: added: actiontrigger performance
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.