corosync locks all its current and future memory

Bug #1911904 reported by Dan Streetman
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
corosync (Ubuntu)
Fix Released
Medium
Dan Streetman
Bionic
Fix Released
Medium
Dan Streetman
Focal
Fix Released
Medium
Dan Streetman
Groovy
Fix Released
Medium
Dan Streetman
Hirsute
Fix Released
Medium
Dan Streetman

Bug Description

[impact]

as with several other programs, corosync appears to think it's special and needs to have all its memory permanently locked so nothing is ever swapped. Before it does so, it attempts to increase its rlimit to infinity, which is good, as otherwise future memory allocation attempts will fail once the application's memory usage reaches its rlimit.

Unfortunately, while it tries to increase its rlimit, it doesn't actually check if the setrlimit() succeeded. This results in the unfortunate situation of locking all future memory *without* an infinite rlimit, which is essentially guaranteed to cause corosync to fail allocating memory at some point in the future.

[test case]

this causes autopkgtest failures such as bug 1828228 due to crashing (on memory allocation failure). it can be reproduced in an unprivileged container but just starting and using corosync.

$ lxc launch ubuntu:groovy lp1911904-g
$ lxc config set lp1911904-g limits.kernel.memlock 64000000
$ lxc stop lp1911904-g
$ lxc start lp1911904-g
$ lxc shell lp1911904-g
root@lp1911904-g:~# prlimit | grep MEMLOCK
MEMLOCK max locked-in-memory address space 64000000 64000000 bytes
root@lp1911904-g:~# apt install -y corosync

corosync will fail to start due to bug 1918735, so edit /etc/corosync/corosync.conf to add 'transport: udp' into the totem {} section, then restart corosync

root@lp1911904-g:~# pidof corosync
1153
root@lp1911904-g:~# prlimit -p 1153 | grep MEMLOCK
MEMLOCK max locked-in-memory address space 64000000 64000000 bytes
root@lp1911904-g:~# grep VmLck /proc/1153/status
VmLck: 36396 kB

note that memory is locked, but the rlimit is not raised to infinity

[regression potential]

any regression likely would involve failure to start corosync, or a memory allocation failure during operation.

[scope]

this is still broken upstream, so it needs to be fixed upstream as well as in all releases.

[other info]

this "worked" before due to systemd enforcing a very low rlimit. In this case, corosync's call to increase its rlimit still failed, but since the rlimit was so low, it was less than corosync's initial memory size, so the mlockall() call also failed, and corosync ignored that (just logged it) and continued just fine without any of its memory locked.

then, systemd's rlimit was bumped by bug 1830746. This allowed corosync (and several other applications that think they are 'special' and should never swap: bug 1890394 and bug 1890394) to continue to fail in increasing its rlimit, but now its mlockall() call succeeds. This results in corosync failing a short time later as its memory usage reaches its rlimit and its memory allocations all fail.

As should be entirely clear since corosync hasn't been able to lock its memory at all up until now but yet seemed to work fine, there is virtually no application out there that *actually* should lock all its memory (qemu it a notable exception when sriov is involved, where it *does* need to lock all memory).

Upstream corosync should completely remove the call to increase its rlimit and mlockall(). It's not needed and only causes problems. However if upstream corosync insists that corosync is special and needs to lock all memory, it *at least* needs to check setrlimit() and avoid mlockall() if it's unable to increase its rlimit.

this is also related to bug 1918735 and bug 1828228

Related branches

Dan Streetman (ddstreet)
no longer affects: auto-package-testing
Revision history for this message
Dan Streetman (ddstreet) wrote :
Dan Streetman (ddstreet)
Changed in corosync (Ubuntu Hirsute):
assignee: nobody → Dan Streetman (ddstreet)
Changed in corosync (Ubuntu Groovy):
assignee: nobody → Dan Streetman (ddstreet)
Changed in corosync (Ubuntu Focal):
assignee: nobody → Dan Streetman (ddstreet)
Changed in corosync (Ubuntu Bionic):
assignee: nobody → Dan Streetman (ddstreet)
Changed in corosync (Ubuntu Groovy):
importance: Undecided → Medium
Changed in corosync (Ubuntu Hirsute):
importance: Undecided → Medium
status: New → In Progress
Changed in corosync (Ubuntu Groovy):
status: New → In Progress
Changed in corosync (Ubuntu Bionic):
status: New → In Progress
Changed in corosync (Ubuntu Focal):
status: New → In Progress
importance: Undecided → Medium
Changed in corosync (Ubuntu Bionic):
importance: Undecided → Medium
Dan Streetman (ddstreet)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 3.1.0-2ubuntu2

---------------
corosync (3.1.0-2ubuntu2) hirsute; urgency=medium

  * d/p/lp1911904-Don-t-lock-all-current-and-future-memory-if-can-t-in.patch:
    - Don't mlockall() if setrlimit() fails (LP: #1911904)
  * d/p/lp1918735-try-unprivileged-knet-handle-new.patch:
    - Retry knet_handle_new without privileged flag (LP: #1918735)
  * d/t: don't skip tests now that we fixed crashing in container

 -- Dan Streetman <email address hidden> Wed, 10 Mar 2021 12:55:26 -0500

Changed in corosync (Ubuntu Hirsute):
status: In Progress → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Dan, or anyone else affected,

Accepted corosync into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/corosync/3.0.3-2ubuntu3.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in corosync (Ubuntu Groovy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-groovy
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Dan, or anyone else affected,

Accepted corosync into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/corosync/3.0.3-2ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in corosync (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed-focal
Changed in corosync (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Dan, or anyone else affected,

Accepted corosync into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/corosync/2.4.3-0ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (corosync/2.4.3-0ubuntu1.2)

All autopkgtests for the newly accepted corosync (2.4.3-0ubuntu1.2) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

sbd/1.3.1-2 (i386)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#corosync

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Dan Streetman (ddstreet) wrote :

root@lp1911904-g:~# dpkg -l|grep corosync
ii corosync 3.0.3-2ubuntu3 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 3.0.3-2ubuntu3 amd64 cluster engine common library
root@lp1911904-g:~# pidof corosync
1153
root@lp1911904-g:~# prlimit -p 1153 | grep MEMLOCK
MEMLOCK max locked-in-memory address space 64000000 64000000 bytes
root@lp1911904-g:~# grep VmLck /proc/1153/status
VmLck: 36396 kB

root@lp1911904-g:~# dpkg -l|grep corosync
ii corosync 3.0.3-2ubuntu3.1 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 3.0.3-2ubuntu3.1 amd64 cluster engine common library
root@lp1911904-g:~# pidof corosync
1700
root@lp1911904-g:~# prlimit -p 1700 | grep MEMLOCK
MEMLOCK max locked-in-memory address space 64000000 64000000 bytes
root@lp1911904-g:~# grep VmLck /proc/1700/status
VmLck: 0 kB

description: updated
Revision history for this message
Dan Streetman (ddstreet) wrote :

root@lp1911904-f:~# dpkg -l|grep corosync
ii corosync 3.0.3-2ubuntu2 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 3.0.3-2ubuntu2 amd64 cluster engine common library
root@lp1911904-f:~# pidof corosync
1067
root@lp1911904-f:~# prlimit -p 1067 | grep MEMLOCK
MEMLOCK max locked-in-memory address space 64000000 64000000 bytes
root@lp1911904-f:~# grep VmLck /proc/1067/status
VmLck: 35536 kB

root@lp1911904-f:~# dpkg -l|grep corosync
ii corosync 3.0.3-2ubuntu2.1 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 3.0.3-2ubuntu2.1 amd64 cluster engine common library
root@lp1911904-f:~# pidof corosync
1633
root@lp1911904-f:~# prlimit -p 1633 | grep MEMLOCK
MEMLOCK max locked-in-memory address space 64000000 64000000 bytes
root@lp1911904-f:~# grep VmLck /proc/1633/status
VmLck: 0 kB

Revision history for this message
Dan Streetman (ddstreet) wrote :

root@lp1911904-b:~# dpkg -l |grep corosync
ii corosync 2.4.3-0ubuntu1.2 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 2.4.3-0ubuntu1.2 amd64 cluster engine common library
root@lp1911904-b:~# pidof corosync
1451
root@lp1911904-b:~# prlimit -p 1451 | grep MEMLOCK
MEMLOCK max locked-in-memory address space 64000000 64000000 bytes
root@lp1911904-b:~# grep VmLck /proc/1451/status
VmLck: 0 kB

tags: added: verification-done verification-done-bionic verification-done-focal verification-done-groovy
removed: verification-needed verification-needed-bionic verification-needed-focal verification-needed-groovy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 3.0.3-2ubuntu3.1

---------------
corosync (3.0.3-2ubuntu3.1) groovy; urgency=medium

  * d/p/lp1911904-Don-t-lock-all-current-and-future-memory-if-can-t-in.patch:
    - Don't mlockall() if setrlimit() fails (LP: #1911904)
  * d/p/lp1918735-try-unprivileged-knet-handle-new.patch:
    - Retry knet_handle_new without privileged flag (LP: #1918735)
  * d/t: don't skip tests now that we fixed crashing in container

 -- Dan Streetman <email address hidden> Wed, 10 Mar 2021 12:58:00 -0500

Changed in corosync (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for corosync has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 3.0.3-2ubuntu2.1

---------------
corosync (3.0.3-2ubuntu2.1) focal; urgency=medium

  * d/p/lp1911904-Don-t-lock-all-current-and-future-memory-if-can-t-in.patch:
    - Don't mlockall() if setrlimit() fails (LP: #1911904)
  * d/p/lp1918735-try-unprivileged-knet-handle-new.patch:
    - Retry knet_handle_new without privileged flag (LP: #1918735)
  * d/t: don't skip tests now that we fixed crashing in container

 -- Dan Streetman <email address hidden> Wed, 10 Mar 2021 13:00:12 -0500

Changed in corosync (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 2.4.3-0ubuntu1.2

---------------
corosync (2.4.3-0ubuntu1.2) bionic; urgency=medium

  * d/p/lp1911904-Don-t-lock-all-current-and-future-memory-if-can-t-in.patch:
    - Don't mlockall() if setrlimit() fails (LP: #1911904)

 -- Dan Streetman <email address hidden> Wed, 10 Mar 2021 13:00:12 -0500

Changed in corosync (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.