Comment 14 for bug 1740892

Revision history for this message
Nish Aravamudan (nacc) wrote : Re: [Bug 1740892] Re: corosync upgrade on 2018-01-02 caused pacemaker to fail

On Mon, Jan 8, 2018 at 8:48 AM, Victor Tapia <email address hidden> wrote:
> As mentioned by Mario @ #10, stopping corosync while pacemaker runs
> throws the same error as the upgrade. Syslog from Xenial +
> corosync=2.3.5-3ubuntu1:
>
> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopping Pacemaker High Availability Cluster Manager...
> Jan 8 16:24:37 xenial-corosync pacemakerd[28747]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Delaying fencing operations until there are resources to manage
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Scheduling Node xenial-corosync for shutdown
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-52.bz2
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-52.bz2): Complete
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Disconnecting from Corosync
> Jan 8 16:24:37 xenial-corosync cib[28748]: warning: new_event_notification (28748-28753-12): Broken pipe (32)
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync attrd[28751]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync lrmd[28750]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync stonith-ng[28749]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopped Pacemaker High Availability Cluster Manager.
>
>
> Pacemakerd shuts down sending SIGTERM to its components, but after the install, corosync does not start pacemaker. BTW, "systemctl restart corosync" restarts both services perfectly
>
> I think that the option A from James Page (#11) is the way to go

I took a quick look at a LXD container after seeing Felipe and
Victor's posts. It seems like this is a bug in the xenial (at least)
systemd unit files:

# grep pacemaker /lib/systemd/system/corosync.service
# pacemaker.service, and if you want to exert the watchdog when a

# grep corosync /lib/systemd/system/pacemaker.service
After=corosync.service
Requires=corosync.service
# ExecStopPost=/bin/sh -c 'pidof crmd || killall -TERM corosync'

So, what I see is that corosync.service has no dependency on
pacemaker.service (in the file).

pacemaker.service will start after corosync.service. And when
pacemaker.service is shutdown it will be before corosync.service.
Additionally, if pacemaker.service is started, then corosync.service
is started as well.

Note, nothing specifies what Felipe said -- there is no guarantee that
pacemaker is started, restarted, etc. when corosync is.

I think the next step is to look at Bionic's systemd services
(probably newer) or upstream's and see if there is a difference, or
new dependencies added there.