On Mon, Jan 8, 2018 at 10:04 AM, Nish Aravamudan
<email address hidden> wrote:
> On Mon, Jan 8, 2018 at 9:51 AM, Nish Aravamudan
> <email address hidden> wrote:
>> On Mon, Jan 8, 2018 at 8:48 AM, Victor Tapia <email address hidden> wrote:
>>> As mentioned by Mario @ #10, stopping corosync while pacemaker runs
>>> throws the same error as the upgrade. Syslog from Xenial +
>>> corosync=2.3.5-3ubuntu1:
>>>
>>> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopping Pacemaker High Availability Cluster Manager...
>>> Jan 8 16:24:37 xenial-corosync pacemakerd[28747]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Delaying fencing operations until there are resources to manage
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Scheduling Node xenial-corosync for shutdown
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-52.bz2
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-52.bz2): Complete
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Disconnecting from Corosync
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: warning: new_event_notification (28748-28753-12): Broken pipe (32)
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync attrd[28751]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync lrmd[28750]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync stonith-ng[28749]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
>>> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopped Pacemaker High Availability Cluster Manager.
>>>
>>>
>>> Pacemakerd shuts down sending SIGTERM to its components, but after the install, corosync does not start pacemaker. BTW, "systemctl restart corosync" restarts both services perfectly
>>>
>>> I think that the option A from James Page (#11) is the way to go
>>
>> I took a quick look at a LXD container after seeing Felipe and
>> Victor's posts. It seems like this is a bug in the xenial (at least)
>> systemd unit files:
>>
>> # grep pacemaker /lib/systemd/system/corosync.service
>> # pacemaker.service, and if you want to exert the watchdog when a
>>
>> # grep corosync /lib/systemd/system/pacemaker.service
>> After=corosync.service
>> Requires=corosync.service
>> # ExecStopPost=/bin/sh -c 'pidof crmd || killall -TERM corosync'
>>
>> So, what I see is that corosync.service has no dependency on
>> pacemaker.service (in the file).
>>
>> pacemaker.service will start after corosync.service. And when
>> pacemaker.service is shutdown it will be before corosync.service.
>> Additionally, if pacemaker.service is started, then corosync.service
>> is started as well.
>>
>> Note, nothing specifies what Felipe said -- there is no guarantee that
>> pacemaker is started, restarted, etc. when corosync is.
>>
>> I think the next step is to look at Bionic's systemd services
>> (probably newer) or upstream's and see if there is a difference, or
>> new dependencies added there.
>
> Or perhaps ask upstream what they think is providing this assurance in
> their systemd files, because I'm not seeing it.
>
> If we have a hard dependency between pacemaker and corosync, then I
> think we might need a PartOf directive, in order to ensure they are
> always following the state transitions together.
Or if that is bad (because it does feel like a layering violation and
maybe it makes sense to have either pacemaker or corosync installed
with the other), the pacemaker.service should says
WantedBy=corosync.service
That will ensure that when corosync.service starts, pacemaker.service
starts. The Requires line ensures that when corosync.service stops,
pacemaker stops (with the order specified by the After).
On Mon, Jan 8, 2018 at 10:04 AM, Nish Aravamudan 2.3.5-3ubuntu1: pacemaker/ pengine/ pe-input- 52.bz2 /var/lib/ pacemaker/ pengine/ pe-input- 52.bz2) : Complete notification (28748-28753-12): Broken pipe (32) system/ corosync. service system/ pacemaker. service service corosync. service /bin/sh -c 'pidof crmd || killall -TERM corosync'
<email address hidden> wrote:
> On Mon, Jan 8, 2018 at 9:51 AM, Nish Aravamudan
> <email address hidden> wrote:
>> On Mon, Jan 8, 2018 at 8:48 AM, Victor Tapia <email address hidden> wrote:
>>> As mentioned by Mario @ #10, stopping corosync while pacemaker runs
>>> throws the same error as the upgrade. Syslog from Xenial +
>>> corosync=
>>>
>>> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopping Pacemaker High Availability Cluster Manager...
>>> Jan 8 16:24:37 xenial-corosync pacemakerd[28747]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Delaying fencing operations until there are resources to manage
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Scheduling Node xenial-corosync for shutdown
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Calculated Transition 1: /var/lib/
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Disconnecting from Corosync
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: warning: new_event_
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync attrd[28751]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync lrmd[28750]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync stonith-ng[28749]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
>>> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopped Pacemaker High Availability Cluster Manager.
>>>
>>>
>>> Pacemakerd shuts down sending SIGTERM to its components, but after the install, corosync does not start pacemaker. BTW, "systemctl restart corosync" restarts both services perfectly
>>>
>>> I think that the option A from James Page (#11) is the way to go
>>
>> I took a quick look at a LXD container after seeing Felipe and
>> Victor's posts. It seems like this is a bug in the xenial (at least)
>> systemd unit files:
>>
>> # grep pacemaker /lib/systemd/
>> # pacemaker.service, and if you want to exert the watchdog when a
>>
>> # grep corosync /lib/systemd/
>> After=corosync.
>> Requires=
>> # ExecStopPost=
>>
>> So, what I see is that corosync.service has no dependency on
>> pacemaker.service (in the file).
>>
>> pacemaker.service will start after corosync.service. And when
>> pacemaker.service is shutdown it will be before corosync.service.
>> Additionally, if pacemaker.service is started, then corosync.service
>> is started as well.
>>
>> Note, nothing specifies what Felipe said -- there is no guarantee that
>> pacemaker is started, restarted, etc. when corosync is.
>>
>> I think the next step is to look at Bionic's systemd services
>> (probably newer) or upstream's and see if there is a difference, or
>> new dependencies added there.
>
> Or perhaps ask upstream what they think is providing this assurance in
> their systemd files, because I'm not seeing it.
>
> If we have a hard dependency between pacemaker and corosync, then I
> think we might need a PartOf directive, in order to ensure they are
> always following the state transitions together.
Or if that is bad (because it does feel like a layering violation and
maybe it makes sense to have either pacemaker or corosync installed
with the other), the pacemaker.service should says
WantedBy= corosync. service
That will ensure that when corosync.service starts, pacemaker.service
starts. The Requires line ensures that when corosync.service stops,
pacemaker stops (with the order specified by the After).
I think :)