Perl services can crash with a "Use of freed value in iteration" error

Bug #1953044 reported by Galen Charlton
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenSRF
Confirmed
Medium
Unassigned

Bug Description

Perl app listeners can occasionally throw the following exception:

server: died with error Use of freed value in iteration at /usr/lib/x86_64-linux-gnu/perl/5.28/IO/Select.pm line 70.

When this happens, the listener will kill its drones and attempt to reset itself (though the reset doesn't work for other reasons that I'll document in a separate bug).

We have seen this in servers running Perl 5.24.2 and 5.28.1; it may well affect other versions of Perl.

The cause appears to be an interaction between how OpenSRF::Server->check_status() sets up IO::Select to check on child pipes and OpenSRF::Server->reap_children() cleans up dead drones. In particular, if ->reap_children() is invoked while ->check_status() is adding pipes to the IO::Select object and happens to reap a child that was on the active list, IO::Select->add() can crash with the error listed above.

This bug appears to be very sensitive to changes in Perl's garbage collector and how it manages reference counts to stack variables. This may explain why this bug may have been hiding for a long time.

OpenSRF 3.1+

Tags: pullrequest
Galen Charlton (gmc)
Changed in opensrf:
importance: Undecided → Medium
milestone: none → 3.2.3
Revision history for this message
Galen Charlton (gmc) wrote :

A patch is available at the tip of

working/user/gmcharlt/lp1953044_fix_freed_value_error / https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/gmcharlt/lp1953044_fix_freed_value_error

If anybody can come up with a more deterministic reproduction plan, that would be excellent.

tags: added: pullrequest
Changed in opensrf:
status: New → Confirmed
Revision history for this message
Jeff Davis (jdavis-sitka) wrote :

So far I haven't been able to reproduce the bug in testing. I've made the suggested changes from the commit message, and after 4000 opensrf.slooooooow.wait requests (200 parallel requests at a time) I'm not seeing the "freed value in iteration" error. My test environment is running Perl 5.30.0 and (roughly) OpenSRF 3.2.2.

Revision history for this message
Bill Erickson (berick) wrote :

I started hitting this issue frequently when load testing my experimental Redis code. Applying Galens' branch helped, but did not fully resolve it, especially at higher loads. After some experimenting, I found the issue is partly related to the freeing of the child, and partly related to the swapping of the active_list array mid-loop. The "freed value" is the active_list array reference.

Here's another branch that resolved the issue for me by copying the array and sanity checking the array values at runtime:

https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/berick/lp1953044-loop-freed-value

Changed in opensrf:
milestone: 3.2.3 → 3.2.4
Revision history for this message
Bill Erickson (berick) wrote :

Just noting we've been running my patch in production for a while. So far so good. We occasionally had the "Use of freed value" issue on our utility server and it has stopped.

Galen Charlton (gmc)
Changed in opensrf:
milestone: 3.2.4 → 3.2.5
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.