cgproxy hangs

Bug #1394919 reported by Oliver Grawert
This bug report is a duplicate of:  Bug #1377332: [TOPBLOCKER] UI randomly freezes. Edit Remove
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
apport (Ubuntu)
Invalid
Undecided
Unassigned
apport (Ubuntu RTM)
Invalid
Undecided
Unassigned
cgmanager (Ubuntu)
Confirmed
Critical
Unassigned
cgmanager (Ubuntu RTM)
Confirmed
Critical
Unassigned
ubuntu-app-launch (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Errors Bucket
-------------
https://errors.ubuntu.com/problem/9a1df90760a88c9b4e5e7e3b4ef450f6b5669c7c

since a few days my phone started producing a
/var/crash/_usr_share_apport_recoverable_problem.32011.crash

i see the timestamp updated multiple times and the matching PID from the traceback seems to be cgmanager

https://errors.ubuntu.com/oops/c25e8678-70f2-11e4-976f-fa163e4aaad4
https://errors.ubuntu.com/oops/94398858-70a0-11e4-906f-fa163e339c81
https://errors.ubuntu.com/oops/235b2dd0-7118-11e4-837c-fa163e5bb1a2

are the respective whoopsie uploads.

Related branches

Revision history for this message
Oliver Grawert (ogra) wrote :

the timestamps seem to go along with a session crash i see

Revision history for this message
Victor Tuson Palau (vtuson) wrote :

have we been able to confirm that the crashes are associated to the session crash?

Revision history for this message
Oliver Grawert (ogra) wrote :

no, we havent, the timestamps of teh crash files just match the time of session crashes i had

tags: added: lt-category-visible
Revision history for this message
Brian Murray (brian-murray) wrote :

You could probably recreate the apport crash by manually calling recoverable_problem on cgmanager when it is running.

/usr/bin/python3 /usr/share/apport/recoverable_problem -p

This should end up creating the same traceback in apport.

Revision history for this message
Ted Gould (ted) wrote :

It looks like that this is a permissions issue. Unity8 is asking to report a recoverable problem on cgmanager, but Unity is under the phablet user and cgmanager is root. Thus Unity runs recoverable problem as the phablet user and it doesn't have permissions to pull all the data out of cgmanager.

I think that the best solution here is to have recoverable_problem be setuid root.

description: updated
Revision history for this message
Oliver Grawert (ogra) wrote :

well, there is definitely an issue with cgmanager ... jean baptiste has collected some info ... the apport part is just fallout.

that you cant start apps anymore at some point or your session crashes (which i'm still not sure that it is not just a co-incidence here) would be the serious issue ...

Revision history for this message
Ted Gould (ted) wrote : Re: [Bug 1394919] Re: constant crash in trying to collect info for recoverable error

After talking a bit with security there is no way that we can make it
setuid root since it's run through an interpreter (Python) and we don't
want the interpreter to be setuid root. So since there's now way to
actually report the error I'm proposing a branch to remove that. It
fixes the *immediate* bug that is here, but I'll leave it to others how
they want to track a potential issue in cgmanager.

Revision history for this message
Ted Gould (ted) wrote : Re: constant crash in trying to collect info for recoverable error

Also, not sure if the apport items should stay just because apport shouldn't segfault, but instead fail gracefully in this case.

Revision history for this message
Oliver Grawert (ogra) wrote :

cgmanager log when the issue happens: http://paste.ubuntu.com/9156976/

syslog when the issue happens: http://paste.ubuntu.com/9157026/

Revision history for this message
Oliver Grawert (ogra) wrote :

after this happened, the session recovers on its own but no apps can be started anymore..

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ubuntu-app-launch - 0.4+15.04.20141121-0ubuntu1

---------------
ubuntu-app-launch (0.4+15.04.20141121-0ubuntu1) vivid; urgency=low

  [ Ted Gould ]
  * Remove reporting a recoverable problem on cgmanager (LP: #1394919)
  * Use a version script to ensure we're not leaking symbols
  * Create a custom GMainContext when waiting on the CGManager DBus
    connection. (LP: #1394622)
 -- Ubuntu daily release <email address hidden> Fri, 21 Nov 2014 21:17:30 +0000

Changed in ubuntu-app-launch (Ubuntu):
status: New → Fix Released
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Marking as critical, it happened again this morning with 168 without any specific action from the user. I couldn't even reboot the phone, same symptoms than Oliver.

tags: added: rtm14
Changed in cgmanager (Ubuntu RTM):
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Oliver Grawert (ogra) wrote :

i had it several times today that:

a) apps do not start anymore
b) suspended apps do not recover anymore
c) only a system reboot helps to get out of this state

i collected a bunch of logs at http://paste.ubuntu.com/9197719/ while the system is in this state
obviously cgmanager hangs and does not react anymore to any interaction here ...

Revision history for this message
Oliver Grawert (ogra) wrote :

i also just noticed that my dbus log is full of:

** (zeitgeist-fts:2999): WARNING **: Unable to get info on application://ubuntu-app-launch.desktop
(process:2082): GLib-GObject-WARNING **: /build/buildd/glib2.0-2.41.5/./gobject/gsignal.c:3101: signal id '33' is invalid for instance '0x1829ad8'
(process:2082): GLib-GObject-WARNING **: /build/buildd/glib2.0-2.41.5/./gobject/gsignal.c:3101: signal id '33' is invalid for instance '0x1829ad8'

summary: - constant crash in trying to collect info for recoverable error
+ constant crash in trying to collect info for recoverable error of
+ cgmanager
Revision history for this message
Launchpad Janitor (janitor) wrote : Re: constant crash in trying to collect info for recoverable error of cgmanager

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apport (Ubuntu):
status: New → Confirmed
Changed in cgmanager (Ubuntu):
status: New → Confirmed
Revision history for this message
Alan Pope 🍺🐧🐱 🦄 (popey) wrote :

"Me too"ing because I saw this quite a bit over the weekend, making the phone mostly unusable.

Revision history for this message
Ted Gould (ted) wrote : Re: [Bug 1394919] Re: constant crash in trying to collect info for recoverable error of cgmanager

We landed in Vivid on Friday the fix for the flickering windows and not
reporting the recoverable error. While neither directly relates, this
could possibly be after effects of those. Has anyone seen this issue on
Vivid since Friday nightish?

Revision history for this message
Oliver Grawert (ogra) wrote : Re: constant crash in trying to collect info for recoverable error of cgmanager

i am pretty sure this is simply the same as bug 1377332
the core issue with cgmanager from this bug was never fixed (it is in critical, incomplete and unassigned state) only workarounds have been put in place to hide the symptoms in ubuntu-app-launch. the actual crash/breakage was never fixed.

according to commant #38 in the above bug i enabled the debug mode for cgmanager now. i belive everyone who could repro this bug should try to do the same to collect as much info as we can.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

cgmanager.log with debug mode enabled.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

cgproxy.log with debug mode enabled.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

I tried with cgmanager 0.32-4ubuntu1 and the problem remains.

The following command hangs:
dbus-send --print-reply --address=unix:path=/sys/fs/cgroup/cgmanager/sock --type=method_call /org/linuxcontainers/cgmanager org.linuxcontainers.cgmanager0_0.Ping "int32:1"

The following command:
dbus-send --print-reply --address=unix:path=/sys/fs/cgroup/cgmanager.lower/sock --type=method_call /org/linuxcontainers/cgmanager org.linuxcontainers.cgmanager0_0.Ping "int32:1"
returns:
method return sender=(null sender) -> dest=(null destination) reply_serial=1

The system can be recovered by killing cgproxy

Revision history for this message
Stéphane Graber (stgraber) wrote :

On the next hang, please report the following:
 - dbus-send to both bus addresses (confirming the hang)
 - ls -lh /proc/$(pidof cgmanager)/fd/
 - ls -lh /proc/$(pidof cgproxy)/fd/
 - gdb -p $(pidof cgmanager) -ex bt
 - gdb -p $(pidof cgproxy) -ex bt
 - dmesg
 - free
 - /var/log/upstart/cgmanager.log
 - /var/log/upstart/cgproxy.log
 - ps aux | grep cgmanager
 - ps aux | grep cgproxy

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Here are all the logs you requested. The first dbus-send command hangs and the second works.
Perhaps it is a coincidence but the problem seems to be triggered when the systems running out of memory, killed webapp-container and tries to restore gmail webapp (or maybe any other webapp)

In syslog webapp-container is killed at
Nov 25 08:24:23 ubuntu-phablet kernel: [16138.987397]Killing 'webapp-containe' (23587), adj 900,

It almost match with the time of the apport crash (symptom of this bug)
$ grep ^Date /var/crash/_usr_share_apport_recoverable_problem.32011.crash
Date: Tue Nov 25 08:24:50 2014

summary: - constant crash in trying to collect info for recoverable error of
- cgmanager
+ cgproxy hangs
Changed in cgmanager (Ubuntu):
importance: Undecided → Critical
Changed in apport (Ubuntu):
status: Confirmed → Invalid
Changed in apport (Ubuntu RTM):
status: New → Invalid
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

I'm marking this report as duplicate of 1377332 since this is the original issue and apport was the symptom.
I created bug 1396160 for apport specific issue.

Revision history for this message
Ted Gould (ted) wrote : Re: [Bug 1394919] Re: constant crash in trying to collect info for recoverable error of cgmanager

On Tue, 2014-11-25 at 07:53 +0000, Jean-Baptiste Lallement wrote:

> In syslog webapp-container is killed at
> Nov 25 08:24:23 ubuntu-phablet kernel: [16138.987397]Killing 'webapp-containe' (23587), adj 900,

That probably explains the multiple requests in the log. The utility
that runs when a process is killed is cgroup-reap-all, which ensures
that all the PIDs from the cgroup are killed. It does this by killing
them, but also ensuring that the list becomes empty. It can do multiple
requests to ensure that the application isn't avoiding being killed by
trying to "out spawn" the cleanup utility.

http://bazaar.launchpad.net/~indicator-applet-developers/ubuntu-app-launch/trunk.rtm-14.09/view/head:/cgroup-reap-all.c

It could be triggering a race in cgproxy where a PID is leaving the
group, or the group is being destroyed at the same time as the reaper is
asking for the PIDs in it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.