Comment 3 for bug 1597908

Revision history for this message
Keith Busch (keith-busch) wrote :

This system crashes making apport-collect not possible after the fact, though I confirm it is a bug. As the upstream nvme driver maintainer, I can recommend either which driver commits need to be reverted, or which kernel commit needs to be cherry-picked (preferring the latter :)).

Here is a snippet of stack trace:

<3>[51827.132142] BUG: scheduling while atomic: swapper/19/0/0x00000100
<4>[51827.242686] Modules linked in: nvme binfmt_misc PlxSvc(OE) ipmi_devintf intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass input_leds joydev sb_edac ipmi_ssif edac_core mei_me mei lpc_ich ioatdma shpchp ipmi_si ipmi_msghandler 8250_fintek acpi_pad acpi_power_meter mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear igb dca ptp ahci crct10dif_pclmul crc32_pclmul hid_generic mxm_wmi aesni_intel aes_x86_64 lrw gf128mul usbhid glue_helper ablk_helper pps_core cryptd hid libahci i2c_algo_bit fjes wmi
<4>[51827.242743] CPU: 19 PID: 0 Comm: swapper/19 Tainted: G W OE 4.4.0-24-generic #43-Ubuntu
<4>[51827.242746] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.11.01.0132.060620160917 06/06/2016
<4>[51827.242748] 0000000000000286 374975818f2884ca ffff88105de43a98 ffffffff813eab23
<4>[51827.242752] ffff88105de56d00 0000000000000000 ffff88105de43aa8 ffffffff810a5ceb
<4>[51827.242762] ffff88105de43af8 ffffffff818217d6 ffff88105de43ac8 3749758100000013
<4>[51827.242765] Call Trace:
<4>[51827.242768] <IRQ> [<ffffffff813eab23>] dump_stack+0x63/0x90
<4>[51827.242781] [<ffffffff810a5ceb>] __schedule_bug+0x4b/0x60
<4>[51827.242788] [<ffffffff818217d6>] __schedule+0x726/0xa30
<4>[51827.242792] [<ffffffff81821b15>] schedule+0x35/0x80
<4>[51827.242797] [<ffffffff81824ba9>] schedule_timeout+0x129/0x270
<4>[51827.242802] [<ffffffff810ec480>] ? trace_event_raw_event_tick_stop+0x120/0x120
<4>[51827.242807] [<ffffffff810ec89d>] msleep+0x2d/0x40
<4>[51827.242813] [<ffffffffc02cd470>] nvme_wait_ready+0x90/0x100 [nvme]
<4>[51827.242818] [<ffffffffc02cee70>] nvme_disable_ctrl+0x40/0x50 [nvme]
<4>[51827.242823] [<ffffffffc02d1b3d>] nvme_disable_admin_queue+0x8d/0x90 [nvme]
<4>[51827.242828] [<ffffffffc02d1dde>] nvme_dev_disable+0x29e/0x2c0 [nvme]
<4>[51827.242833] [<ffffffffc02d03a0>] ? __nvme_process_cq+0x200/0x200 [nvme]
<4>[51827.242838] [<ffffffff8154955c>] ? dev_warn+0x6c/0x90
<4>[51827.242843] [<ffffffffc02d1ff0>] nvme_timeout+0x110/0x1d0 [nvme]
<4>[51827.242847] [<ffffffff813ea92f>] ? cpumask_next_and+0x2f/0x40
<4>[51827.242850] [<ffffffff810bd4bc>] ? load_balance+0x18c/0x980
<4>[51827.242854] [<ffffffff813c5cdf>] blk_mq_rq_timed_out+0x2f/0x70
<4>[51827.242857] [<ffffffff813c5d6e>] blk_mq_check_expired+0x4e/0x80
<4>[51827.242861] [<ffffffff813c86c8>] bt_for_each+0xd8/0xe0
<4>[51827.242864] [<ffffffff813c5d20>] ? blk_mq_rq_timed_out+0x70/0x70
<4>[51827.242868] [<ffffffff813c5d20>] ? blk_mq_rq_timed_out+0x70/0x70
<4>[51827.242871] [<ffffffff813c8ed7>] blk_mq_queue_tag_busy_iter+0x47/0xc0
<4>[51827.242875] [<ffffffff813c4a80>] ? blk_mq_attempt_merge+0xb0/0xb0
<4>[51827.242878] [<ffffffff813c4ac1>] blk_mq_rq_timer+0x41/0xf0
<4>[51827.242882] [<ffffffff810ec4c5>] call_timer_fn+0x35/0x120
<4>[51827.242885] [<ffffffff813c4a80>] ? blk_mq_attempt_merge+0xb0/0xb0
<4>[51827.242890] [<ffffffff810ece7a>] run_timer_softirq+0x23a/0x2f0
<4>[51827.242894] [<ffffffff81085b11>] __do_softirq+0x101/0x290
<4>[51827.242899] [<ffffffff81085e13>] irq_exit+0xa3/0xb0
<4>[51827.242902] [<ffffffff818286a2>] smp_apic_timer_interrupt+0x42/0x50
<4>[51827.242905] [<ffffffff81826962>] apic_timer_interrupt+0x82/0x90
<4>[51827.242907] <EOI> [<ffffffff816bcd21>] ? cpuidle_enter_state+0x111/0x2b0
<4>[51827.242914] [<ffffffff816bcef7>] cpuidle_enter+0x17/0x20
<4>[51827.242918] [<ffffffff810c3ec2>] call_cpuidle+0x32/0x60
<4>[51827.242921] [<ffffffff816bced3>] ? cpuidle_select+0x13/0x20
<4>[51827.242925] [<ffffffff810c4180>] cpu_startup_entry+0x290/0x350
<4>[51827.242929] [<ffffffff81051714>] start_secondary+0x154/0x190
<3>[51827.242934] bad: scheduling from the idle thread!