pcs cluster setup hangs

Bug #1640919 reported by Jonathan Meisel
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
pcs (Debian)
Fix Released
Unknown
pcs (Ubuntu)
Fix Released
Medium
Rafael David Tinoco
Xenial
Fix Released
Medium
Rafael David Tinoco
Yakkety
Fix Released
Medium
Rafael David Tinoco
Zesty
Fix Released
Medium
Rafael David Tinoco

Bug Description

[Impact]

 * PCS might take too much time when destroying a cluster.
 * There is a look for leftovers over /var/lib/ using "find".
 * If lxcfs is too big that might take awhile.

[Test Case]

 * Install PCS
 * Configure a pacemaker cluster using PCS
 * Make sure /var/lib/ is huge
 * Try to destroy the cluster using PCS

[Regression Potential]

 * Purging leftovers could be affected.
 * Almost none.

[Other Info]

Fixing together (same SRU):

https://bugs.launchpad.net/ubuntu/+source/pcs/+bug/1580035 (xenial)
https://bugs.launchpad.net/ubuntu/+source/pcs/+bug/1580045 (xenial)
And this one (yakkety & zesty)

[Original Description]

PCS cluster setup hangs, apparently due to a "find" command attempting to search through a fuse mountpoint directory (/var/lib/lxcfs/*).
----------------->%-----------------
lsb_release -rd
Description: Ubuntu 16.04.1 LTS
Release: 16.04

-----------------%<-----------------

apt-cache policy pcs
pcs:
  Installed: 0.9.149-1
  Candidate: 0.9.149-1
  Version table:
 *** 0.9.149-1 500
----------------->%-----------------

PCS cluster setup hangs when cleaning up old cluster configurations, apparently due to a "find" command attempting to search through a fuse mountpoint directory (/var/lib/lxcfs/*).

sudo pcs cluster setup --name jmclus1 uby2 uby3
Destroying cluster on nodes: uby2, uby3...
uby2: Stopping Cluster (pacemaker)...
uby3: Stopping Cluster (pacemaker)...
---setup hangs here----

The setup seems to hang because of this line in /usr/lib/python2.7/dist-packages/pcs/cluste r.py (which attempts to delete stale cluster configuration xml files:

os.system("find /var/lib -name '"+name+"' -exec rm -f \{\} \;")

sudo find /var/lib -name 'cib-*' 2>&1 | grep 'Permission denied' | wc -l
426

Changing this line to:
os.system("find /var/lib/pacemaker -name '"+name+"' -exec rm -f \{\} \;") to avoid searching under /var/lib/lxcfs (the fuse mountpoint) provided a workaround for me.

sosreport-J.Meisel.1640923-20161115104001.tar.xz -- this is an sosreport from before installing pcs / pacemaker / corosync

sosreport-J.Meisel.1640919-20161115105845.tar.xz -- this is an sosreport while pcs cluster setup hung

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in pcs (Ubuntu):
status: New → Confirmed
Changed in pcs (Ubuntu):
assignee: nobody → Rafael David Tinoco (inaddy)
importance: Undecided → Medium
description: updated
description: updated
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

So, since I'm fixing:

https://bugs.launchpad.net/ubuntu/+source/pcs/+bug/1580035
https://bugs.launchpad.net/ubuntu/+source/pcs/+bug/1580045
https://bugs.launchpad.net/ubuntu/+source/pcs/+bug/1640923

I created a merge proposal in upstream project:

https://github.com/ClusterLabs/pcs/pull/119

For this particular fix:

https://bugs.launchpad.net/ubuntu/+source/pcs/+bug/1640919

I'll wait for this upstream small change to be accepted so I can propose complete fix to Debian and provide a PPA to be used while Debian is fixed (and then Zesty, Yakkety and finally Xenial).

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

This particular fix was accepted by upstream:

commit c64ad9f24ddd671714a11933a2a0ff6d87df6492
Author: Rafael David Tinoco <email address hidden>
Date: Tue Dec 6 01:34:18 2016 +0000

    Fix: "find" should run only in specific directories

    Some users reported that running find over "/var/lib" for cleanup
    purposes can take too long depending on what you have installed.
    A particular example was having "lxcfs" fuse mounted in /var/lib.
    That can make the search for cluster leftovers to take quite some
    time, making user to believe the process has hang.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Related Debian Bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=847294

For the upstream (packaging) fix.

Changed in pcs (Debian):
status: Unknown → New
description: updated
Revision history for this message
Mattia Rizzolo (mapreri) wrote :

Now, I approved all series nomination (of this and the related bugs), please fix up the metadata accordingly.

For the uploads: I'm always kinda ick to upload a package introducing a delta from debian, but well...
What I don't like are the changelogs: you're barely saying what you did, just listing the name of the patches you added and related bugs. I'd like to see "Add patch <name> to fix <problem>. LP: #xxxxx". This tends to be particularly important for SRUs, where users actually go reading the changelogs to see what they are doing to their stable installation and figure whether the changes affects them.

In the xenial debdiff, you have an invalid Bug-Debian in 0007-Fix-IPv6-bind.patch.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Mattia,

> For the uploads: I'm always kinda ick to upload a package introducing a delta from debian, but well...

Well, you gotta fix what you gotta fix =).

> What I don't like are the changelogs: you're barely saying what you did, just listing the name of the patches you added and related bugs. I'd like to see "Add patch <name> to fix <problem>. LP: #xxxxx". This tends to be particularly important for SRUs, where users actually go reading the changelogs to see what they are doing to their stable installation and figure whether the changes affects them.

I can definitely add more information if you would like. You can see that the patches are really well explained. Some maintainers prefer a specific format in changelog, some others. I usually use this one but I do agree it doesn't cause any harm to add more information.

> In the xenial debdiff, you have an invalid Bug-Debian in 0007-Fix-IPv6-bind.patch.

Tks for reviewing it, will upload a new one shortly.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I attached the debdiffs again with suggestions you made.

Per favore fatemi sapere se avete bisogno di qualcosa di diverso.

Thank you much for sponsoring/reviewing.

Cheers

Rafael Tinoco

Changed in pcs (Debian):
status: New → Fix Released
Revision history for this message
Mattia Rizzolo (mapreri) wrote :

Umh, why did you close the Debian bug?

btw, I'll handle this during the day; the patches look good.

Revision history for this message
Mattia Rizzolo (mapreri) wrote :

All uploaded.
Thank you!

Changed in pcs (Ubuntu Zesty):
status: Confirmed → In Progress
Changed in pcs (Ubuntu Yakkety):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in pcs (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Rafael David Tinoco (inaddy)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pcs - 0.9.155-1ubuntu1

---------------
pcs (0.9.155-1ubuntu1) zesty; urgency=medium

  * Patch d/p/0011-Find-on-specific-directories-only.patch (LP: #1640919)
    Cleaning files when destroying a cluster was taking too long in /var/lib.
    Specifying /var/lib/pacemaker for "finding" leftovers solves the issue.

 -- Rafael David Tinoco <email address hidden> Fri, 09 Dec 2016 03:33:30 +0000

Changed in pcs (Ubuntu Zesty):
status: In Progress → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Jonathan, or anyone else affected,

Accepted pcs into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pcs/0.9.153-2ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in pcs (Ubuntu Yakkety):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Jonathan, or anyone else affected,

Accepted pcs into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pcs/0.9.149-1ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in pcs (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Felipe Reyes (freyes) wrote :

The version available in xenial-updates does fixes this issue. Evidence -> http://pastebin.ubuntu.com/23736947/

tags: added: verification-done-xenial
Revision history for this message
Felipe Reyes (freyes) wrote :

The version available in yakkety-update does fix this issue. Evidence -> http://pastebin.ubuntu.com/23737093/

tags: added: verification-done-yakkety
tags: added: verification-done
removed: verification-needed
Revision history for this message
Felipe Reyes (freyes) wrote :

For the record, this is the script used to setup a clean machine with pcs:

sudo apt-get -q update
sudo apt-get -q -y install pcs

# we make sure corosync and pacemaker are stopped
sudo systemctl stop pacemaker
sudo systemctl stop corosync

# we remove corosync.conf, otherwise pcs will refuse to write the config
sudo rm -f /etc/corosync/corosync.conf

# pcs will configure corosync to log under /var/log/cluster
test -d /var/log/cluster || sudo mkdir /var/log/cluster

echo -e "ubuntu\nubuntu" | sudo passwd hacluster
sudo systemctl restart pcsd.service
sleep 10
sudo pcs cluster auth "pcs" -u hacluster -p ubuntu --force --debug
sudo pcs cluster setup --debug --force --name pacemaker1 "pcs"

Felipe Reyes (freyes)
tags: added: sts
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pcs - 0.9.149-1ubuntu1

---------------
pcs (0.9.149-1ubuntu1) xenial; urgency=medium

  * Patch d/p/0006-Replace-orderedhash.patch added by Debian (LP: #1580035)
    Orderedhash gem can't be used due to licensing issues.
    Dependency package orderedhash replaced by activesupport.
    The daemon was not functional in Xenial before this.
  * Patch d/p/0007-Fix-IPv6-bind.patch (LP: #1580045)
    The daemon was being initialized listening only for IPv6 protocol.
  * Patch d/p/0008-Find-on-specific-directories-only.patch (LP: #1640919)
    Cleaning files when destroying a cluster was taking too long in /var/lib.
    Specifying /var/lib/pacemaker for "finding" leftovers solves the issue.

 -- Rafael David Tinoco <email address hidden> Fri, 09 Dec 2016 03:07:25 +0000

Changed in pcs (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for pcs has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Felipe Reyes (freyes) wrote :

Brian, is there any reason to not push this SRU into yakkety-updates as well?, it went into xenial-updates first :\

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pcs - 0.9.153-2ubuntu1

---------------
pcs (0.9.153-2ubuntu1) yakkety; urgency=medium

  * Patch d/p/0010-Find-on-specific-directories-only.patch (LP: #1640919)
    Cleaning files when destroying a cluster was taking too long in /var/lib.
    Specifying /var/lib/pacemaker for "finding" leftovers solves the issue.

 -- Rafael David Tinoco <email address hidden> Fri, 09 Dec 2016 03:40:55 +0000

Changed in pcs (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.