bogus utime and stime in /proc/<PID/stat

Bug #1023214 reported by alecm3
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned

Bug Description

I administer a large number of servers, and I have this problem only with Ubuntu 10.04 LTS: I run a server under normal load (say load average 3.0 on an 8-core server). The "top" command shows processes taking certain % of CPU that cause this load average: say

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11008 mysql 20 0 25.9g 22g 5496 S 67 76.0 643539:38 mysqld

ps -o pcpu,pid -p11008
%CPU PID
53.1 11008

everything is consistent.

The all of the sudden, the process causing the load average disappears from "top", but the process continues to run normally (albeit with a slight performance decrease), and the system load average becomes somewhat higher. The output of ps -o pcpu becomes bogus:

# ps -o pcpu,pid -p11008
%CPU PID
317910278 1587

This happened to at least 5 different severs (different brand new IBM System X hardware), each running different software: one httpd 2.2, one mysqld 5.1, and one Twisted Python TCP servers. Each time the kernel was between 2.6.32-32-server and 2.6.32-40-server. I updated some machines to 2.6.32-41-server, and it has not happened on those yet, but the bug is rare (once every 60 days or so).

This is from an affected machine:

Code:
top - 10:39:06 up 73 days, 17:57, 3 users, load average: 6.62, 5.60, 5.34
Tasks: 207 total, 2 running, 205 sleeping, 0 stopped, 0 zombie
Cpu(s): 11.4%us, 18.0%sy, 0.0%ni, 66.3%id, 4.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 74341464k total, 71985004k used, 2356460k free, 236456k buffers
Swap: 3906552k total, 328k used, 3906224k free, 24838212k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
805 root 20 0 0 0 0 S 3 0.0 1493:09 fct0-worker
982 root 20 0 0 0 0 S 1 0.0 111:35.05 fioa-data-groom
914 root 20 0 0 0 0 S 0 0.0 884:42.71 fct1-worker
1068 root 20 0 19364 1496 1060 R 0 0.0 0:00.02 top

Nothing causing high load is showing on top, but I have two highly loaded mysqld instances on it, that suddenly show crazy %CPU:

#ps -o pcpu,pid,cmd -p1587
%CPU PID CMD
317713124 1587 /nail/encap/mysql-5.1.60/libexec/mysqld
and

#ps -o pcpu,pid,cmd -p1624
%CPU PID CMD
2802 1624 /nail/encap/mysql-5.1.60/libexec/mysqld

Here are the numbers from

# cat /proc/1587/stat
1587 (mysqld) S 1212 1088 1088 0 -1 4202752 14307313 0 162 0 85773299069 4611685932654088833 0 0 20 0 52 0 3549 27255418880 5483524 18446744073709551615 4194304 11111617 140733749236976 140733749235984 8858659 0 552967 4102 26345 18446744073709551615 0 0 17 5 0 0 0 0 0

The 14th and 15th numbers according to

man proc and fs/proc/array.c are supposed to be:

utime %lu Amount of time that this process has been scheduled in user mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK). This includes guest time, guest_time (time spent running a virtual CPU, see
below), so that applications that are not aware of the guest time field do not lose that time from
their calculations.

stime %lu Amount of time that this process has been scheduled in kernel mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK).

On a normal server, these numbers 13 orders of magnitude smaller and they are advancing by about 10 per sec. On a buggy server, these numbers are stuck at a ridiculously high value like 4611685932654088833, and they are not changing.

I noticed that this bug does not happen on 2.6.24-26-server , and there has been many changes in fs/proc/array.c between 2.6.24-26-server and 2.6.32-32-server when this bug starting to occur.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-server 2.6.32.40.47
Regression: Yes
Reproducible: No
ProcVersionSignature: Ubuntu 2.6.32-40.87-server 2.6.32.57+drm33.23
Uname: Linux 2.6.32-40-server x86_64
NonfreeKernelModules: iomemory_vsl
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
Date: Tue Jul 10 17:33:10 2012
Frequency: Once a month.
InstallationMedia: Ubuntu-Server 10.04.2 LTS "Lucid Lynx" - Release amd64 (20110211.1)
MachineType: IBM System x3550 M3 -[7944AC1]-
PciMultimedia:

ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-40-server root=UUID=ef643bc9-46ac-474d-bb00-4e0bfee77808 ro quiet
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux
dmi.bios.date: 02/08/2011
dmi.bios.vendor: IBM Corp.
dmi.bios.version: -[D6E150CUS-1.11]-
dmi.board.asset.tag: (none)
dmi.board.name: 69Y5698
dmi.board.vendor: IBM
dmi.board.version: (none)
dmi.chassis.asset.tag: none
dmi.chassis.type: 23
dmi.chassis.vendor: IBM
dmi.chassis.version: none
dmi.modalias: dmi:bvnIBMCorp.:bvr-[D6E150CUS-1.11]-:bd02/08/2011:svnIBM:pnSystemx3550M3-[7944AC1]-:pvr00:rvnIBM:rn69Y5698:rvr(none):cvnIBM:ct23:cvrnone:
dmi.product.name: System x3550 M3 -[7944AC1]-
dmi.product.version: 00
dmi.sys.vendor: IBM

Revision history for this message
alecm3 (alecm-chatango) wrote :
alecm3 (alecm-chatango)
description: updated
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
alecm3 (alecm-chatango) wrote :

I am not sure why it's marked "medium" importance: it keeps happening on all our production servers. Unfortunately, this bug will cause us to switch distributions.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@alecm3, do you have the option to test some kernels on a machine that exhibits this bug?

Changed in linux (Ubuntu):
importance: Medium → High
tags: added: kernel-da-key
Revision history for this message
Brad Figg (brad-figg) wrote :

@alecm3,

If you can:
  1. The kernel you are running is not the most recent in -updates. Please update one of your systems and see if the problem is still happening.
  2. If the problem still exists, please try one of our LTS backport packages. I'd suggest trying the Oneiric one and see if that makes a difference.

Revision history for this message
alecm3 (alecm-chatango) wrote :

I upgraded several machines to 2.6.32-41-server about a week ago, which seems to be the latest 10.04 LTS kernel. If the bug occurs on 2.6.32-41-server, I will add information immediately. I am not very optimistic about this upgrade, since the diff of fs/proc/array.c between 2.6.32-41-server and 2.6.32-40-server shows no changes, but there are substential diffs between 2.6.24-26-server and 2.6.32-32-server in that file, and that's the version when I first encountered the bug.

Revision history for this message
alecm3 (alecm-chatango) wrote :

@brad-figg
re: option 2
I could potentially try a backported kernel, do I need to use Oneiric or I can go with Pangolin, because Pangolin is LTS?
This upgrade would require some preparation and scheduling, I cannot do it immediately.

Revision history for this message
alecm3 (alecm-chatango) wrote :

I have a 2.6.32-40-server production machine that just entered this state after a month of normal operations. If anyone is interested, I can collect more data from it. So far,

# ps -o pid,pcpu,time -p3061
  PID %CPU TIME
 3061 14157 1184014533-20:50:00

# cat /proc/3061/stat
3061 (mysqld) S 2688 2682 1982 0 -1 4202752 6720120 0 9 0 72106386682 4611685975708793406 0 0 20 0 52 0 637627271 27166662656 5482170 18446744073709551615 4194304 11111617 140736788381792 140736788380800 8858659 0 552967 4102 26345 18446744073709551615 0 0 17 1 0 0 0 0 0

Revision history for this message
Stephen Croll (stephen-d-croll) wrote :
Download full text (5.3 KiB)

FYI: There are a pair of kernel commits that *might* solve this
issue, at least for newer kernels. The first can be found in
Linus's tree (as well as linux-next) and the second is currently
only in linux-next:

[scroll@scroll-kernel-vm linux-next]$ git remote -v
origin http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (fetch)
origin http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (push)

[scroll@scroll-kernel-vm linux-next]$ git log -n1 bea6832cc8c4a0a9a65dd17da6aaa657fe27bc3e
commit bea6832cc8c4a0a9a65dd17da6aaa657fe27bc3e
Author: Stanislaw Gruszka <email address hidden>
Date: Wed Aug 8 11:27:15 2012 +0200

    sched: fix divide by zero at {thread_group,task}_times

    On architectures where cputime_t is 64 bit type, is possible to trigger
    divide by zero on do_div(temp, (__force u32) total) line, if total is a
    non zero number but has lower 32 bit's zeroed. Removing casting is not
    a good solution since some do_div() implementations do cast to u32
    internally.

    This problem can be triggered in practice on very long lived processes:

      PID: 2331 TASK: ffff880472814b00 CPU: 2 COMMAND: "oraagent.bin"
       #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
       #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
       #2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
       #3 [ffff880472a51cd0] die at ffffffff8100f26b
       #4 [ffff880472a51d00] do_trap at ffffffff814f03f4
       #5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
       #6 [ffff880472a51e00] divide_error at ffffffff8100be7b
          [exception RIP: thread_group_times+0x56]
          RIP: ffffffff81056a16 RSP: ffff880472a51eb8 RFLAGS: 00010046
          RAX: bc3572c9fe12d194 RBX: ffff880874150800 RCX: 0000000110266fad
          RDX: 0000000000000000 RSI: ffff880472a51eb8 RDI: 001038ae7d9633dc
          RBP: ffff880472a51ef8 R8: 00000000b10a3a64 R9: ffff880874150800
          R10: 00007fcba27ab680 R11: 0000000000000202 R12: ffff880472a51f08
          R13: ffff880472a51f10 R14: 0000000000000000 R15: 0000000000000007
          ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
       #7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
       #8 [ffff880472a51f40] sys_times at ffffffff81088524
       #9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
          RIP: 0000003808caac3a RSP: 00007fcba27ab6d8 RFLAGS: 00000202
          RAX: 0000000000000064 RBX: ffffffff8100b0f2 RCX: 0000000000000000
          RDX: 00007fcba27ab6e0 RSI: 000000000076d58e RDI: 00007fcba27ab6e0
          RBP: 00007fcba27ab700 R8: 0000000000000020 R9: 000000000000091b
          R10: 00007fcba27ab680 R11: 0000000000000202 R12: 00007fff9ca41940
          R13: 0000000000000000 R14: 00007fcba27ac9c0 R15: 00007fff9ca41940
          ORIG_RAX: 0000000000000064 CS: 0033 SS: 002b

    Cc: <email address hidden>
    Signed-off-by: Stanislaw Gruszka <email address hidden>
    Signed-off-by: Peter Zijlstra <email address hidden>
    Link: http://<email address hidden>
    Signed-off-by: Thomas Gleixner <email address hidden>

[scroll@scroll-kernel-v...

Read more...

Revision history for this message
alecm3 (alecm-chatango) wrote :

@Stephen Croll :Do you know if this patch has been incorporated into 12.04LTS? What about 10.04LTS?

Revision history for this message
penalvch (penalvch) wrote :

alecm3, the patch mentioned by Stephen Croll is available in Precise+ as per http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=commit;h=a041ea7c6c7b5ab0b4cfc2d0d50fbab9a9f9602c . Would it be possible to test this out in Precise server via http://releases.ubuntu.com/precise/ and advise to the results?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.