cgroup-bin breaks suspend to RAM

Bug #756499 reported by foregam
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
libcgroup (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Binary package hint: cgroup-bin

Release: Ubuntu 10.04.2 LTS
Kernel: 2.6.32-30-generic
Architecture: amd64
Package: cgroup-bin 0.34-0ubuntu2

Description:
Several days ago I installed cgroup-bin in order to try the shell version of the '200-line miracle patch' as described in [1]. Since then suspend to RAM is broken, module unloading is affected too. The bug is 100% reproducible:
1) boot with 'cgred' and 'cgconfig' services enabled;
2) try to suspend, no matter how (closing the lid, pressing the power button, 'Suspend' from the Indicator applet, pm-suspend, echo mem > /sys/power/state, etc.);
3) observe your computer being rendered completely unresponsive to any keyboard or mouse event;
4) hard reboot.

Suspending after stopping 'cgred' and 'cgconfig' or removing cgroup-bin works as expected.

'service cgconfig stop' says:
Stopping cgconfig service: sed: couldn't flush stdout: No such process
rmdir: failed to remove `./sysdefault': Device or resource busy

The laptop model is Compaq 610, specs can be found at [2].

lspci output:
00:00.0 Host bridge: Intel Corporation Mobile GME965/GLE960 Memory Controller Hub (rev 0c)
00:02.0 VGA compatible controller: Intel Corporation Mobile GME965/GLE960 Integrated Graphics Controller (rev 0c)
00:02.1 Display controller: Intel Corporation Mobile GME965/GLE960 Integrated Graphics Controller (rev 0c)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3)
00:1f.0 ISA bridge: Intel Corporation 82801HEM (ICH8M) LPC Interface Controller (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03)
10:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection
30:00.0 Ethernet controller: Marvell Technology Group Ltd. Device 4357 (rev 10)

[1] http://www.webupd8.org/2010/11/alternative-to-200-lines-kernel-patch.html
[2] http://h18000.www1.hp.com/products/quickspecs/13304_div/13304_div.PDF

Revision history for this message
Jon Bernard (jbernard) wrote :

This sounds a lot like Debian bug #555711 [1]. Can you take a look at that and see if the proposed fix solves your problem?

[1]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=555711

Revision history for this message
foregam (foregam) wrote :

Yes, that does the trick. Here's the new cgconfig.conf:

group sysdefault {
    cpu {
        cpu.rt_runtime_us = 500000;
    }
}

mount {
    cpu = /dev/cgroup/cpu;
    cpuacct = /dev/cgroup/cpuacct;
}

BTW the stock config file defined /mnt/cgroups as the root for cgroup mounts. I think the default should be changed to a) make /dev/cgroup the root directory for cgroup mounts; b) have the lines 'group sysdefault {...}'.

Changed in libcgroup (Ubuntu):
status: New → Triaged
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I believe this is fixed in current oneiric (and natty) packages. Please re-open if I'm wrong about that.

Changed in libcgroup (Ubuntu):
status: Triaged → Fix Released
importance: Undecided → Medium
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Another report suggests this is not in fact fixed.

Changed in libcgroup (Ubuntu):
status: Fix Released → Confirmed
Revision history for this message
Stéphane Graber (stgraber) wrote :

Right, in my case with current cgroup-bin, I observe the following:
 1) Boot my laptop with cgroup-bin
 2) Suspend
 3) Resume
 4) Works fine
 5) Suspend
 6) System freezes, unresponsive

So I can do an initial suspend/resume sequence before I get hit by the bug.
Removing cgroup-bin fixes the problem for me.

It looks like a kernel panic or similar as keyboard doesn't work at all, neither does the power button or even sys-rq sequences.

Revision history for this message
William Grant (wgrant) wrote :

I was seeing exactly the same symptoms as stgraber (first suspend worked), and removing cgroup-bin indeed solved it.

Revision history for this message
Rioting_Pacifst (rioting-pacifist) wrote :

I'm marking this a duplicate of #693594 but if you come across this here are some Workarounds/Fixes:

1)do not move all tasks into a default cgroup
comment CREATE_DEFAULT=yes out and uncomment CREATE_DEFAULT=no in /etc/default/cgconfig

2)putting the following in /etc/cgconfig.conf fixed (a better value as this is kernels default anyway)
group sysdefault {
    cpu {
        cpu.rt_runtime_us = 950000;
    }
}

3) Add a rule so that [kthreadd] is never put in the default group

In case anybody was wondering what is going on:
Suspend fails because it cannot shutdown your extra CPUs ( you can test this with echo 0 > /sys/devices/system/cpu/cpu$x/online ).
The kernel can't shutdown the extra CPUs because some it spawns some threads (ksoftirqd and migration) that need realtime scheduling and because they are in the sysdefault subgroup it will not allow it (you can fix this with echo PID of the stuck thread to /sys/fs/cgroup/cpu/tasks (the thread will show up as state D (Uninterruptible sleep))).

Note that on some kernels: (e.g 3.0.0 that is shipped with 11.10) you can shutdown the CPUs once each before the problem occurs (e.g one suspend) but on others ( e.g 3.3.0-rc4) you cant.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.