BUG: soft lockup - CPU stuck for 22s! [md3_raid1]

Bug #1356558 reported by Uli Middelberg
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Hi,

this bug appeared in Ubuntu 14.04, Ubuntu 12.04 didn't show this behavior.
I've found a possible predecessor in bug #212684.
Switching to a recent mainline kernel didn't fix this issue.

Starting

 $ /usr/share/mdadm/checkarray /dev/md3

will repeat this behavior.

/dev/md3 is a LVM PV:

  --- Physical volume ---
  PV Name /dev/md3
  VG Name local_vg1
  PV Size 1.36 TiB / not usable 2.25 MiB
  Allocatable yes
  PE Size 4.00 MiB
  Total PE 355619
  Free PE 31011
  Allocated PE 324608

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-33-generic 3.13.0-33.58
ProcVersionSignature: Ubuntu 3.13.0-33.58-generic 3.13.11.4
Uname: Linux 3.13.0-33-generic i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version k3.13.0-33-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.3
Architecture: i386
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
Date: Wed Aug 13 21:42:12 2014
HibernationDevice: RESUME=UUID=c4a03c5e-f650-4ca8-9dab-f0c4ad12c901
InstallationDate: Installed on 2014-07-06 (37 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release i386 (20140416.2)
IwConfig:
 lo no wireless extensions.

 em1 no wireless extensions.
ProcEnviron:
 LANGUAGE=en_GB:en
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-33-generic root=UUID=9c2715cb-dd8a-4687-9537-009e1e4e83d8 ro nomdmonddf nomdmonisw
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-33-generic N/A
 linux-backports-modules-3.13.0-33-generic N/A
 linux-firmware 1.127.5
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/19/2010
dmi.bios.vendor: Intel Corp.
dmi.bios.version: JT94510H.86A.0045.2010.0519.1750
dmi.board.name: D945GSEJT
dmi.board.vendor: Intel Corporation
dmi.board.version: AAE57850-300
dmi.chassis.type: 3
dmi.modalias: dmi:bvnIntelCorp.:bvrJT94510H.86A.0045.2010.0519.1750:bd05/19/2010:svn:pn:pvr:rvnIntelCorporation:rnD945GSEJT:rvrAAE57850-300:cvn:ct3:cvr:

Revision history for this message
Uli Middelberg (uli-k) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a bisect to figure out what commit caused this regression. We need to identify the earliest kernel where the issue started happening as well as the latest kernel that did not have this issue.

Can you test the following kernels and report back? We are looking for the first kernel version that exhibits this bug:

v3.13 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13-trusty/
v3.13.5: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.5-trusty/
v3.13.11.4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.11.4-trusty/

You don't have to test every kernel, just up until the kernel that first has this bug.

Thanks in advance!

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: performing-bisect
Revision history for this message
Uli Middelberg (uli-k) wrote :
Revision history for this message
Uli Middelberg (uli-k) wrote :
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Ok, so it sounds like the regression does eventually happen with 3.13 Final. We should probably test some earlier kernels to find the last kernel version where the regression did not happen. Can you test the following kernels and see which one does not hit the regression? It will probably require that you test the same amount of time that it took to hit the issue in 3.13 final:

3.12 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.12-trusty/
3.11 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.11-saucy/

Revision history for this message
Uli Middelberg (uli-k) wrote :

3.12 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.12-trusty/ doesn't seem to have this regression

Revision history for this message
Uli Middelberg (uli-k) wrote :

is there any kernel I should try next?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

We should try some of the 3.13 release candidates since 3.12 final does not have the bug.

Can you test the following kernels and report back? We are looking for the first kernel version that exhibits this bug:

v3.13-rc3: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13-rc3-trusty/

If v3.13-rc3 does not exhibit the bug then test v3.13-rc6:
v3.13-rc6: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13-rc6-trusty/

If v3.13-rc3 does exhibit the bug then test v3.13-rc2:
v3.13-rc2: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13-rc2-trusty

You don't have to test every kernel, just up until the kernel that first has this bug.

Thanks in advance!

Revision history for this message
Uli Middelberg (uli-k) wrote :

v3.13-rc3: doesn't exhibit this bug
v3.13-rc6: exhibits this bug

I'll try v3.13-rc4 and v3.13-rc5 next, but http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13-rc4-trusty/ seems to be incomplete

Revision history for this message
Uli Middelberg (uli-k) wrote :

v3.13-rc5: exhibits this bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'll start a bisect between v3.13-rc3 and v3.13-rc5. It will require testing about 7 - 10 test kernels. Some of the kernels should exhibit the bug, while some should not. I'll post the first test kernel shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v3.13-rc3 and v3.13-rc5. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
308d17ef9530f236466a31a7855fc3d5176292d4

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

penalvch (penalvch)
tags: added: latest-bios-0045
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Uli Middelberg (uli-k) wrote :

308d17ef9530f236466a31a7855fc3d5176292d4: doesn't exhibit this bug

Revision history for this message
Uli Middelberg (uli-k) wrote :

308d17ef9530f236466a31a7855fc3d5176292d4: exhibits this bug

Sorry for the confusion.

penalvch (penalvch)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
c6c1f325adc8a8e0cd06c6ad0ca232a6880a1783

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :
Download full text (6.0 KiB)

c6c1f325adc8a8e0cd06c6ad0ca232a6880a1783: exhibits this bug:

Sep 8 22:18:29 box kernel: [ 1940.144024] BUG: soft lockup - CPU#0 stuck for 22s! [md3_raid1:172]
Sep 8 22:18:29 box kernel: [ 1940.144024] Modules linked in: xt_multiport iptable_filter ip_tables x_tables gpio_ich snd_hda_codec_realtek coretemp snd_hda_intel snd_hda_codec serio_raw snd_hwdep pl2303 usblp usbserial lpc_ich snd_pcm snd_page_alloc snd_timer i915 drm_kms_helper snd drm soundcore i2c_algo_bit video mac_hid parport_pc ppdev lp parport dm_snapshot raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor ahci psmouse r8169 sata_via libahci raid6_pq mii raid1 raid0 multipath linear
Sep 8 22:18:29 box kernel: [ 1940.144024] CPU: 0 PID: 172 Comm: md3_raid1 Not tainted 3.13.0-031300rc1-generic #201409081748
Sep 8 22:18:29 box kernel: [ 1940.144024] Hardware name: /D945GSEJT, BIOS JT94510H.86A.0045.2010.0519.1750 05/19/2010
Sep 8 22:18:29 box kernel: [ 1940.144024] task: f6888cf0 ti: f6b68000 task.ti: f6b68000
Sep 8 22:18:29 box kernel: [ 1940.144024] EIP: 0060:[<c12ff0b2>] EFLAGS: 00000297 CPU: 0
Sep 8 22:18:29 box kernel: [ 1940.144024] EIP is at memcmp+0x32/0x60
Sep 8 22:18:29 box kernel: [ 1940.144024] EAX: ecb27000 EBX: 0000007e ECX: 00000962 EDX: ec8de000
Sep 8 22:18:29 box kernel: [ 1940.144024] ESI: 0000007e EDI: 00000fff EBP: f6b69e9c ESP: f6b69e8c
Sep 8 22:18:29 box kernel: [ 1940.144024] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Sep 8 22:18:29 box kernel: [ 1940.144024] CR0: 8005003b CR2: bfc95ef4 CR3: 01ae4000 CR4: 000007f0
Sep 8 22:18:29 box kernel: [ 1940.144024] Stack:
Sep 8 22:18:29 box kernel: [ 1940.144024] 00000000 00000006 00001000 00000048 f6b69ee0 f845d3b3 ec8de000 00000001
Sep 8 22:18:29 box kernel: [ 1940.144024] 0000000f f6847800 00000007 ecb9e800 f69d2f80 f50da4e0 0000000c ecb9ea00
Sep 8 22:18:29 box kernel: [ 1940.144024] 0000001c ecd53300 00000002 ecd53300 f6847800 f6b69f00 f845d667 f6847800
Sep 8 22:18:29 box kernel: [ 1940.144024] Call Trace:
Sep 8 22:18:29 box kernel: [ 1940.144024] [<f845d3b3>] process_checks+0x173/0x300 [raid1]
Sep 8 22:18:29 box kernel: [ 1940.144024] [<f845d667>] sync_request_write+0x127/0x160 [raid1]
Sep 8 22:18:29 box kernel: [ 1940.144024] [<f845f5d2>] raid1d+0x102/0x140 [raid1]
Sep 8 22:18:29 box kernel: [ 1940.144024] [<c151bf54>] md_thread+0xe4/0x110
Sep 8 22:18:29 box kernel: [ 1940.144024] [<c1095010>] ? __wake_up_sync+0x20/0x20
Sep 8 22:18:29 box kernel: [ 1940.144024] [<c151be70>] ? md_rdev_init+0x100/0x100
Sep 8 22:18:29 box kernel: [ 1940.144024] [<c107801b>] kthread+0x9b/0xb0
Sep 8 22:18:29 box kernel: [ 1940.144024] [<c1684077>] ret_from_kernel_thread+0x1b/0x28
Sep 8 22:18:29 box kernel: [ 1940.144024] [<c1077f80>] ? flush_kthread_worker+0x90/0x90
Sep 8 22:18:29 box kernel: [ 1940.144024] Code: ec 04 85 c9 c7 45 f0 00 00 00 00 74 29 0f b6 30 0f b6 1a 29 de 89 75 f0 75 1c 8d 79 ff 31 c9 eb 11 0f b6 74 08 01 0f b6 5c 0a 01 <83> c1 01 29 de 75 0f 39 f9 75 eb 8b 45 f0 83 c4 04 5b 5e 5f 5d
Sep 8 22:38:36 box mdadm[1615]: Rebuild21 event detected on md device /dev/md/03
Sep 8 22:41:05 box kernel: [ 3296.144025] BUG: soft lockup - CPU#0 s...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Trusty test kernel with a revert of commit c6c1f325.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? If it does still exhibit the bug, I'll have to look at the bisect results further.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

Hi Joseph,

unfortunately the kernel you are offering for testing is for the amd64 platform, but I need it for the i386 platform.

Regards
Uli

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a i386 Trusty test kernel with a revert of commit c6c1f325.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not?

Revision history for this message
Uli Middelberg (uli-k) wrote :

I'd like to test the kernel, but this particular build doesn't come up with support for networking nor usb keyboard. I've attached the dmesg.gz.

Revision history for this message
Uli Middelberg (uli-k) wrote :

The v3.13-rc3 build instead is running well.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did you install both the linux-image and linux-image-extra .deb packages for my test kernel?

Revision history for this message
Uli Middelberg (uli-k) wrote :

OK, I didn't install the linux-image-extra package, it wasn't necessary with the other kernels I've tested before. With this package installed, the kernel boots properly, but

 c6c1f325: exhibits this bug.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

So the test kernel posted in comment #20 does fix this bug? It has commit c6c1f325 reverted.

Revision history for this message
Uli Middelberg (uli-k) wrote :
Download full text (16.9 KiB)

Hello Joseph,

the last kernel, you have prepared for testing contains this bug.

I've just noticed that 3.13.0-031300rc3-generic kernel contains the bug the bug as well, so the bisect went in the wrong direction. Sorry for this. I'll have a deeper look inside, hoping to understand why the bug didn't appear during the first testing.

[44094.835704] md: data-check of RAID array md1
[44094.835716] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[44094.835723] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[44094.835732] md: using 128k window, over a total of 4192192k.
[44095.909810] type=1400 audit(1411015241.514:35): apparmor="STATUS" operation="profile_replace" name="/usr/lib/cups/backend/cups-pdf" pid=8865 comm="apparmor_parser"
[44095.909839] type=1400 audit(1411015241.514:36): apparmor="STATUS" operation="profile_replace" name="/usr/sbin/cupsd" pid=8865 comm="apparmor_parser"
[44095.912967] type=1400 audit(1411015241.518:37): apparmor="STATUS" operation="profile_replace" name="/usr/sbin/cupsd" pid=8865 comm="apparmor_parser"
[44403.916160] INFO: task md1_resync:8767 blocked for more than 120 seconds.
[44403.916448] Not tainted 3.13.0-031300rc3-generic #201312061335
[44403.916692] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44403.916988] md1_resync D c10a8e24 0 8767 2 0x00000000
[44403.917009] ebc1bd5c 00000046 ebc1bce8 c10a8e24 00000092 20e0b127 00002833 c1af0400
[44403.917039] c105bbd2 c1af0400 f7bde400 e794b400 f7210d00 00000000 f6a8be48 ebc1bd1c
[44403.917066] ebc1bd30 f6a8be00 f6a8be48 ebc1bd70 ebc1bd90 c16998b3 00000082 e794b400
[44403.917093] Call Trace:
[44403.917122] [<c10a8e24>] ? irq_to_desc+0x14/0x20
[44403.917141] [<c105bbd2>] ? irq_exit+0x62/0xa0
[44403.917162] [<c16998b3>] ? common_interrupt+0x33/0x38
[44403.917178] [<c1093f31>] ? prepare_to_wait_event+0x71/0xd0
[44403.917195] [<c168e733>] schedule+0x23/0x60
[44403.917247] [<f8444e45>] raise_barrier+0x125/0x1c0 [raid1]
[44403.917263] [<c1094010>] ? __wake_up_sync+0x20/0x20
[44403.917291] [<f8444fab>] sync_request+0xcb/0xad0 [raid1]
[44403.917312] [<c1534b9f>] md_do_sync+0x9bf/0x1000
[44403.917341] [<f8444ee0>] ? raise_barrier+0x1c0/0x1c0 [raid1]
[44403.917361] [<c10653d7>] ? recalc_sigpending+0x17/0x50
[44403.917379] [<c1531da0>] ? md_rdev_init+0x100/0x100
[44403.917395] [<c1531e84>] md_thread+0xe4/0x110
[44403.917409] [<c1093adf>] ? __wake_up_locked+0x1f/0x30
[44403.917426] [<c1531da0>] ? md_rdev_init+0x100/0x100
[44403.917441] [<c107625b>] kthread+0x9b/0xb0
[44403.917459] [<c1699337>] ret_from_kernel_thread+0x1b/0x28
[44403.917474] [<c10761c0>] ? flush_kthread_worker+0x90/0x90
[44523.916155] INFO: task md1_resync:8767 blocked for more than 120 seconds.
[44523.916433] Not tainted 3.13.0-031300rc3-generic #201312061335
[44523.916676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44523.916972] md1_resync D c10a8e24 0 8767 2 0x00000000
[44523.916993] ebc1bd5c 00000046 ebc1bce8 c10a8e24 00000092 20e0b127 00002833 c1af0400
[44523.917022] c105bbd2 c1af0400 f7bde400 e794b400 f7210d00 00000000 f6a8be48 ebc...

Revision history for this message
Uli Middelberg (uli-k) wrote :

3.13.0-031300rc2-generic: exhibits this bug

Revision history for this message
Uli Middelberg (uli-k) wrote :

3.13.0-031300rc1-generic: exhibits this bug

I'll try v3.12.14-trusty next

Revision history for this message
Uli Middelberg (uli-k) wrote :

For the records:

3.12 final: clean
...
3.13.0-031300rc1-generic: bug
3
.13.0-031300rc2-generic: bug
3
.13.0-031300rc3-generic: bug
3
.13.0-031300rc5-generic: bug
3
.13.0-031300rc6-generic: bug
3
.13 final: bug

Revision history for this message
Uli Middelberg (uli-k) wrote :

3.12 final: clean
3.12.14: clean
...
3.13.0-031300rc1-generic: bug
3
.13.0-031300rc2-generic: bug
3
.13.0-031300rc3-generic: bug
3
.13.0-031300rc5-generic: bug
3
.13.0-031300rc6-generic: bug
3
.13 final: bug

Revision history for this message
Uli Middelberg (uli-k) wrote :

3.12 final: clean
3.12.14: clean
3.12.21: clean
...
3.13.0-031300rc1-generic: bug
3
.13.0-031300rc2-generic: bug
3
.13.0-031300rc3-generic: bug
3
.13.0-031300rc5-generic: bug
3
.13.0-031300rc6-generic: bug
3
.13 final: bug

Revision history for this message
Uli Middelberg (uli-k) wrote :

3.12 final: clean
3.12.14: clean
3.12.21: clean
3.12.25: clean
...
3.13.0-031300rc1-generic: bug
3
.13.0-031300rc2-generic: bug
3
.13.0-031300rc3-generic: bug
3
.13.0-031300rc5-generic: bug
3
.13.0-031300rc6-generic: bug
3
.13 final: bug

Revision history for this message
Uli Middelberg (uli-k) wrote :

3.12 final: clean
3.12.14: clean
3.12.21: clean
3.12.25: clean
3.12.27: clean
...
3.13.0-031300rc1-generic: bug
3
.13.0-031300rc2-generic: bug
3
.13.0-031300rc3-generic: bug
3
.13.0-031300rc5-generic: bug
3
.13.0-031300rc6-generic: bug
3
.13 final: bug

Revision history for this message
Uli Middelberg (uli-k) wrote :

3.12 final: clean
3.12.14: clean
3.12.21: clean
3.12.25: clean
3.12.27: clean
3.12.28: clean
3.13.0-031300rc1-generic: bug
3
.13.0-031300rc2-generic: bug
3
.13.0-031300rc3-generic: bug
3
.13.0-031300rc5-generic: bug
3
.13.0-031300rc6-generic: bug
3
.13 final: bug

Is there any kernel version before 3.13.0-031300rc1 I should test.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I looks like the bug was introduced in v3.13-rc1. I'll start a kernel bisect between v3.12 final and 3.13-rc1 and post a test kernel shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v3.12 final and v3.13-rc1. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
5cbb3d216e2041700231bcfc383ee5f8b7fc8b74

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

5cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug

[ 908.144025] BUG: soft lockup - CPU#0 stuck for 22s! [md3_raid1:173]
[ 908.144025] Modules linked in: xt_multiport iptable_filter ip_tables x_tables gpio_ich snd_hda_codec_realtek coretemp snd_hda_intel serio_raw snd_hda_codec pl2303 usbserial snd_hwdep usblp i915 lpc_ich snd_pcm drm_kms_helper snd_page_alloc video drm snd_timer snd mac_hid i2c_algo_bit soundcore parport_pc ppdev lp parport dm_snapshot raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor ahci psmouse raid6_pq raid1 raid0 libahci r8169 sata_via multipath mii linear
[ 908.144025] CPU: 0 PID: 173 Comm: md3_raid1 Not tainted 3.12.0-031200-generic #201409241615
[ 908.144025] Hardware name: /D945GSEJT, BIOS JT94510H.86A.0045.2010.0519.1750 05/19/2010
[ 908.144025] task: f6af26d0 ti: f6b82000 task.ti: f6b82000
[ 908.144025] EIP: 0060:[<c12f881d>] EFLAGS: 00000297 CPU: 0
[ 908.144025] EIP is at memcmp+0x2d/0x60
[ 908.144025] EAX: ecdcf000 EBX: 00000090 ECX: 00000270 EDX: ed369000
[ 908.144025] ESI: 000000a7 EDI: 00000fff EBP: f6b83ea4 ESP: f6b83e94
[ 908.144025] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 908.144025] CR0: 8005003b CR2: b91bb5dc CR3: 01acb000 CR4: 000007f0
[ 908.144025] Stack:
[ 908.144025] 00000000 0000000b 00001000 00000084 f6b83ee8 f85d42e3 ed369000 00000001
[ 908.144025] 0000000f f6b04800 0000000c ecc0e600 f6982d00 f50fd9e0 0000000c ecc0fa00
[ 908.144025] 0000001c ed263000 00000002 ed263000 f6b04800 f6b83f08 f85d4597 f6b04800
[ 908.144025] Call Trace:
[ 908.144025] [<f85d42e3>] process_checks+0x173/0x300 [raid1]
[ 908.144025] [<f85d4597>] sync_request_write+0x127/0x160 [raid1]
[ 908.144025] [<f85d6222>] raid1d+0x102/0x140 [raid1]
[ 908.144025] [<c1510d24>] md_thread+0xe4/0x110
[ 908.144025] [<c1095340>] ? __wake_up_sync+0x20/0x20
[ 908.144025] [<c1510c40>] ? md_rdev_init+0x100/0x100
[ 908.144025] [<c1077ebb>] kthread+0x9b/0xb0
[ 908.144025] [<c1674a77>] ret_from_kernel_thread+0x1b/0x28
[ 908.144025] [<c1077e20>] ? flush_kthread_worker+0x90/0x90
[ 908.144025] Code: e5 57 56 53 83 ec 04 85 c9 c7 45 f0 00 00 00 00 74 29 0f b6 30 0f b6 1a 29 de 89 75 f0 75 1c 8d 79 ff 31 c9 eb 11 0f b6 74 08 01 <0f> b6 5c 0a 01 83 c1 01 29 de 75 0f 39 f9 75 eb 8b 45 f0 83 c4

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: seems to be clean, did the check twice.

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
5cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
f095ca6b31cfd20e6e7e0338ed8548d8a4374287

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

f095ca6b31cfd20e6e7e0338ed8548d8a4374287: seems to be clean, did the check twice.

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
5cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
c2d33069915d1f9b3b1dcc2199af11d4e072b037

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

c2d33069915d1f9b3b1dcc2199af11d4e072b037: exhibits this bug

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
c2d33069915d1f9b3b1dcc2199af11d4e072b037: bug
5
cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
2026d24ef2ea8caad5e87662a58075e930ccab63

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

2026d24ef2ea8caad5e87662a58075e930ccab63: clean

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
2026d24ef2ea8caad5e87662a58075e930ccab63: clean
c2d33069915d1f9b3b1dcc2199af11d4e072b037: bug
5
cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
a6bc732b5a96b5403c2637e85c350b95ec6591f3

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

a6bc732b5a96b5403c2637e85c350b95ec6591f3: clean

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
2026d24ef2ea8caad5e87662a58075e930ccab63: clean
a6bc732b5a96b5403c2637e85c350b95ec6591f3: clean
c2d33069915d1f9b3b1dcc2199af11d4e072b037: bug
5
cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
23b4faa9a36257e75dade0f2945bc3e487e6f463

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

23b4faa9a36257e75dade0f2945bc3e487e6f463: clean

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
2026d24ef2ea8caad5e87662a58075e930ccab63: clean
a6bc732b5a96b5403c2637e85c350b95ec6591f3: clean
23b4faa9a36257e75dade0f2945bc3e487e6f463: clean
c2d33069915d1f9b3b1dcc2199af11d4e072b037: bug
5
cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
86467ff2ddca94c0d8d10b92b5916e68c0cad8a9

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

86467ff2ddca94c0d8d10b92b5916e68c0cad8a9: clean

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
2026d24ef2ea8caad5e87662a58075e930ccab63: clean
a6bc732b5a96b5403c2637e85c350b95ec6591f3: clean
23b4faa9a36257e75dade0f2945bc3e487e6f463: clean
86467ff2ddca94c0d8d10b92b5916e68c0cad8a9: clean
c2d33069915d1f9b3b1dcc2199af11d4e072b037: bug
5
cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Uli Middelberg (uli-k) wrote :

Is there anything I should test next?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
8a5dc585d50015af9c079ae2d182dc4c1cd22914

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

8a5dc585d50015af9c079ae2d182dc4c1cd22914: bug

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
2026d24ef2ea8caad5e87662a58075e930ccab63: clean
a6bc732b5a96b5403c2637e85c350b95ec6591f3: clean
23b4faa9a36257e75dade0f2945bc3e487e6f463: clean
86467ff2ddca94c0d8d10b92b5916e68c0cad8a9: clean
8a5dc585d50015af9c079ae2d182dc4c1cd22914: bug
c2d33069915d1f9b3b1dcc2199af11d4e072b037: bug
5
cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
5e1109adde6acd0f3424886da09402ac22ed244b

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

5e1109adde6acd0f3424886da09402ac22ed244b: clean

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
2026d24ef2ea8caad5e87662a58075e930ccab63: clean
a6bc732b5a96b5403c2637e85c350b95ec6591f3: clean
23b4faa9a36257e75dade0f2945bc3e487e6f463: clean
86467ff2ddca94c0d8d10b92b5916e68c0cad8a9: clean
5e1109adde6acd0f3424886da09402ac22ed244b: clean
8a5dc585d50015af9c079ae2d182dc4c1cd22914: bug
c2d33069915d1f9b3b1dcc2199af11d4e072b037: bug
5
cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
a3183c60e3e9be7abd830ebed904491625e07d2e

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

a3183c60e3e9be7abd830ebed904491625e07d2e: clean

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
2026d24ef2ea8caad5e87662a58075e930ccab63: clean
a6bc732b5a96b5403c2637e85c350b95ec6591f3: clean
23b4faa9a36257e75dade0f2945bc3e487e6f463: clean
86467ff2ddca94c0d8d10b92b5916e68c0cad8a9: clean
5e1109adde6acd0f3424886da09402ac22ed244b: clean
a3183c60e3e9be7abd830ebed904491625e07d2e: clean
8a5dc585d50015af9c079ae2d182dc4c1cd22914: bug
c2d33069915d1f9b3b1dcc2199af11d4e072b037: bug
5
cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
94daf85e3c4db3b804205277eec7c4eae6efe9db

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

94daf85e3c4db3b804205277eec7c4eae6efe9db is clean

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
2026d24ef2ea8caad5e87662a58075e930ccab63: clean
a6bc732b5a96b5403c2637e85c350b95ec6591f3: clean
23b4faa9a36257e75dade0f2945bc3e487e6f463: clean
86467ff2ddca94c0d8d10b92b5916e68c0cad8a9: clean
5e1109adde6acd0f3424886da09402ac22ed244b: clean
a3183c60e3e9be7abd830ebed904491625e07d2e: clean
94daf85e3c4db3b804205277eec7c4eae6efe9db: clean
8a5dc585d50015af9c079ae2d182dc4c1cd22914: bug
c2d33069915d1f9b3b1dcc2199af11d4e072b037: bug
5
cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
9da8312048edcf246ac1d7ab6aa0293f252de559

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

9da8312048edcf246ac1d7ab6aa0293f252de559 is clean

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
2026d24ef2ea8caad5e87662a58075e930ccab63: clean
a6bc732b5a96b5403c2637e85c350b95ec6591f3: clean
23b4faa9a36257e75dade0f2945bc3e487e6f463: clean
86467ff2ddca94c0d8d10b92b5916e68c0cad8a9: clean
5e1109adde6acd0f3424886da09402ac22ed244b: clean
a3183c60e3e9be7abd830ebed904491625e07d2e: clean
94daf85e3c4db3b804205277eec7c4eae6efe9db: clean
9da8312048edcf246ac1d7ab6aa0293f252de559: clean
8a5dc585d50015af9c079ae2d182dc4c1cd22914: bug
c2d33069915d1f9b3b1dcc2199af11d4e072b037: bug
5
cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
eeab517b68beb9e044e869bee18e3bdfa60e5aca

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1356558

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Uli Middelberg (uli-k) wrote :

eeab517b68beb9e044e869bee18e3bdfa60e5aca exhibits the bug (quite early this time, only 100s after starting the checks)

3.12 final: clean
f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0: clean
f095ca6b31cfd20e6e7e0338ed8548d8a4374287: clean
2026d24ef2ea8caad5e87662a58075e930ccab63: clean
a6bc732b5a96b5403c2637e85c350b95ec6591f3: clean
23b4faa9a36257e75dade0f2945bc3e487e6f463: clean
86467ff2ddca94c0d8d10b92b5916e68c0cad8a9: clean
5e1109adde6acd0f3424886da09402ac22ed244b: clean
a3183c60e3e9be7abd830ebed904491625e07d2e: clean
94daf85e3c4db3b804205277eec7c4eae6efe9db: clean
9da8312048edcf246ac1d7ab6aa0293f252de559: clean
eeab517b68beb9e044e869bee18e3bdfa60e5aca: bug
8
a5dc585d50015af9c079ae2d182dc4c1cd22914: bug
c2d33069915d1f9b3b1dcc2199af11d4e072b037: bug
5
cbb3d216e2041700231bcfc383ee5f8b7fc8b74: bug
3
.13.0-031300rc1-generic: bug
3
.13 final: bug

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The bisect reported eeab517 as the bad commit. However, this is a merge, so it can't be easily reverted. It will require further investigation.

Can you see if this bug also exists in the 3.18-rc4 kernel:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.18-rc4-vivid/

Revision history for this message
Uli Middelberg (uli-k) wrote :
Download full text (60.8 KiB)

I tried the 3.18-rc4 kernel, the bug is also there:

Nov 16 07:16:39 box kernel: [34280.152007] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [clamscan:2868]
Nov 16 07:16:39 box kernel: [34280.152007] Modules linked in: xt_multiport iptable_filter ip_tables x_tables gpio_ich snd_hda_codec_realtek snd_hda_codec_generic coretemp snd_hda_intel serio_raw snd_hda_controller pl2303 snd_hda_codec i915 usbse
rial lpc_ich snd_hwdep snd_pcm drm_kms_helper snd_timer 8250_fintek snd drm soundcore video mac_hid i2c_algo_bit parport_pc ppdev lp parport dm_snapshot dm_bufio raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor ahci
psmouse libahci r8169 raid6_pq sata_via raid1 raid0 multipath mii linearNov 16 07:16:39 box kernel: [34280.152007] CPU: 1 PID: 2868 Comm: clamscan Not tainted 3.18.0-031800rc4-generic #201411091835
Nov 16 07:16:39 box kernel: [34280.152007] Hardware name: /D945GSEJT, BIOS JT94510H.86A.0045.2010.0519.1750 05/19/2010
Nov 16 07:16:39 box kernel: [34280.152007] task: f6acb100 ti: ec63a000 task.ti: ec63a000
Nov 16 07:16:39 box kernel: [34280.152007] EIP: 0060:[<c115e7fa>] EFLAGS: 00000246 CPU: 1
Nov 16 07:16:39 box kernel: [34280.152007] EIP is at compact_finished+0xea/0x150
Nov 16 07:16:39 box kernel: [34280.152007] EAX: 00000002 EBX: ec63bb20 ECX: 00000008 EDX: 00000009
Nov 16 07:16:39 box kernel: [34280.152007] ESI: c1a41ac0 EDI: 00000002 EBP: ec63bad8 ESP: ec63bac0
Nov 16 07:16:39 box kernel: [34280.152007] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Nov 16 07:16:39 box kernel: [34280.152007] CR0: 80050033 CR2: a5000000 CR3: 2c0d6000 CR4: 000007f0
Nov 16 07:16:39 box kernel: [34280.152007] Stack:
Nov 16 07:16:39 box kernel: [34280.152007] 00000000 00000000 00000002 ec63bb20 c1a41ac0 c1a41ac0 ec63bb18 c1160062
Nov 16 07:16:39 box kernel: [34280.152007] ec63bb20 00000000 00000000 fffff000 00000014 000377fe 000377fe 00000000
Nov 16 07:16:39 box kernel: [34280.152007] ec63bb28 00000002 00001000 c1a41ac0 004352da ec63bb58 ec63bb64 c116043c
Nov 16 07:16:39 box kernel: [34280.152007] Call Trace:
Nov 16 07:16:39 box kernel: [34280.152007] [<c1160062>] compact_zone+0x112/0x360
Nov 16 07:16:39 box kernel: [34280.152007] [<c116043c>] compact_zone_order+0x4c/0x70
Nov 16 07:16:39 box kernel: [34280.152007] [<c1160536>] try_to_compact_pages+0xd6/0x280
Nov 16 07:16:39 box kernel: [34280.152007] [<c16b5de8>] __alloc_pages_direct_compact+0x5b/0x167
Nov 16 07:16:39 box kernel: [34280.152007] [<c1146d4b>] __alloc_pages_nodemask+0x4fb/0x910
Nov 16 07:16:39 box kernel: [34280.152007] [<c1195cd0>] ? commit_charge+0x20/0x70
Nov 16 07:16:39 box kernel: [34280.152007] [<c1190887>] do_huge_pmd_wp_page+0xf7/0x570
Nov 16 07:16:39 box kernel: [34280.152007] [<c11685cf>] ? handle_pte_fault+0x1bf/0x1f0
Nov 16 07:16:39 box kernel: [34280.152007] [<c11688c1>] __handle_mm_fault+0x211/0x290
Nov 16 07:16:39 box kernel: [34280.152007] [<c1168a27>] handle_mm_fault+0xe7/0x150
Nov 16 07:16:39 box kernel: [34280.152007] [<c104d600>] ? trace_do_page_fault+0xd0/0xd0
Nov 16 07:16:39 box kernel: [34280.152007] [<c104d197>] __do_page_fault+0x187/0x520
Nov 16 07:16:39 box kernel: [34280.152007] [<c1143c20>] ? ...

Revision history for this message
Uli Middelberg (uli-k) wrote :

If I totally disable any sound output or even the whole sound subsystem, do you think this will decrease the likelihood of this bug to appear?

Revision history for this message
Uli Middelberg (uli-k) wrote :

I there anything I can do next?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This still needs further investigation.

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Revision history for this message
Uli Middelberg (uli-k) wrote :

Hello Joseph,

before issuing an upstream bug report I tried the first stable release of 3.18 [0] and I wasn't able to reproduce the bug so far. So you may suspend or keep this bug report on hold. I'd really like to know if there is some incidence (i.e. a specific commit) for this bug being fixed by upstream development. Thank you so far.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.18-vivid/

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.