Comment 10 for bug 1368737

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote : Re: Pacemaker can seg fault on crm node online/standy

Running testcase for some time and couldn't get any core dump...

Services seem stable:

Every 1.0s: crm_mon -1 Fri Oct 31 00:52:57 2014

Last updated: Fri Oct 31 00:52:57 2014
Last change: Fri Oct 31 00:31:22 2014 via crm_attribute on clustertrusty04
Stack: corosync
Current DC: clustertrusty02 (12) - partition with quorum
Version: 1.1.10-42f2063
4 Nodes configured
6 Resources configured

Node clustertrusty02 (12): standby
Online: [ clustertrusty01 clustertrusty03 clustertrusty04 ]

 fenceclustertrusty01 (stonith:fence_virsh): Started clustertrusty04
 fenceclustertrusty02 (stonith:fence_virsh): Started clustertrusty03
 fenceclustertrusty03 (stonith:fence_virsh): Started clustertrusty01
 fenceclustertrusty04 (stonith:fence_virsh): Started clustertrusty01
 Resource Group: postfix
     vippostfix (ocf::heartbeat:IPaddr2): Started clustertrusty01
     initpostfix (lsb:postfix): Started clustertrusty01

At this time...

stonith_action_clear_tracking_data is calling g_source_remove and
there are no problems, even when trying to remove an already
removed timer.

Judging by the developer comments on that:

"""
The glib behaviour on unbuntu seems reasonable, removing a source multiple times IS a valid error.
I need the stack trace to know where/how this situation can occur in pacemaker.
"""

Those error messages from glib (not being able to remove the resource),
that are still there :

"""
Oct 31 00:30:20 [2054] clustertrusty03 stonith-ng: error: crm_abort: crm_glib_handler: Forked child 2 197 to record non-fatal assert at logging.c:63 : Source ID 15 was not found when attempting to remove it
"""

Can be interpreted as normal and "non-fatal".