Kernel hang on drive pull caused by regression introduced by commit 287922eb0b18

Bug #1791790 reported by Steven Haber
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Xenial
Fix Released
High
Joseph Salisbury

Bug Description

== SRU Justification ==
The following commit was applied to Xenial and introduced this
regression:
287922eb0b18 ("block: defer timeouts to a workqueue")

This regression was introduced in mainline as of v4.5-rc1. Bionc was
also affected by this regression, but it already go the fix when commit
4e9b6f20828a was applied to mainline in v4.15-rc1.

The regression caused a kernel hang because the HPSA driver has a tendency
to aggressively remove missing devices.

== Fix ==
4e9b6f20828a ("block: Fix a race between blk_cleanup_queue() and timeout handling")

== Regression Potential ==
Low. This commit fixes a regression and has been cc'd to stable, so it
has had addition upstream review. This commit is already applied to
Bionic and Cosmic.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

A bug was introduced when backporting the fix for http://bugs.launchpad.net/bugs/1597908. This bug exists in all Ubuntu 16.04 LTS 4.4 kernels >= 4.4.0-36, and many other non-LTS kernels.

This patch changes the context in which timeout work is scheduled for block devices in the kernel. Previously, timeout work was executed directly from the timer callback that fired when a deadline was met. After the patch, timeout work is scheduled using a background work queue. This means that by the time the work executes, the device queue which originally scheduled the work could be torn down. In order to prevent this, the patch takes a reference on the device queue when executing the timeout work.

The problem is that the last reference to this queue can be removed before the timeout work can be executed. During teardown, the block system executes a freeze followed by a drain. The freeze drops the last reference on the queue. The drain tries to clean up any outstanding work, including timeout work. After a freeze, the timeout work in the background queue is unable to obtain a reference, and exits early without completing work. The work is now permanently stuck in the queue and it will never be completed. The drain in the device teardown path spins indefinitely.

The bug manifests as a hang that looks like this:
[<ffffffff81829f15>] schedule+0x35/0x80
[<ffffffffc014aea9>] hpsa_scan_start+0x109/0x140 [hpsa]
[<ffffffff810c3cb0>] ? wake_atomic_t_function+0x60/0x60
[<ffffffffc014b602>] hpsa_rescan_ctlr_worker+0x1d2/0x652 [hpsa]
[<ffffffff8109a2c5>] process_one_work+0x165/0x480
[<ffffffff8109a62b>] worker_thread+0x4b/0x4c0
[<ffffffff8109a5e0>] ? process_one_work+0x480/0x480
[<ffffffff810a0808>] kthread+0xd8/0xf0
[<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0
[<ffffffff8182e38f>] ret_from_fork+0x3f/0x70
[<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0

The fix exists upstream. It applies, builds, and runs cleanly on Ubuntu's most recent 4.4 kernel.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=4e9b6f20828ac880dbc1fa2fdbafae779473d1af

We hit this bug nearly 100% of the time on some of our HP hardware. The HPSA driver has a tendency to aggressively remove missing devices, so it widens the race. As a result, we've been building our own kernel with this patch applied. It would be really nice if we could get it into mainline Ubuntu.

Let me know what additional information is needed. Thanks!

CVE References

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1791790

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Steven Haber (sthaber) wrote : Re: Kernel hang on drive pull caused by incomplete backport for bug 1597908

Attaching logs gathered by the apport utility. This is for one of our HP boxes running kernel 4.4.0-131.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
importance: Undecided → High
status: Confirmed → In Progress
Changed in linux (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 4e9b6f20828ac880dbc1fa2fdbafae779473d1af. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1791790

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Revision history for this message
Steven Haber (sthaber) wrote :

Hey Joseph! I just ran one of our machines through our drive power faulting test. It survived 5 hotplug events without crashing. Usually it's 100% repro of the crash. Seems to work. Thanks much!

Revision history for this message
Steven Haber (sthaber) wrote :

To clarify -- I ran the testing with all of your kernel packages installed and live, except for cloud-tools, which we don't use on HP hardware (haha).

summary: - Kernel hang on drive pull caused by incomplete backport for bug 1597908
+ Kernel hang on drive pull caused by regression introduced by commit
+ 287922eb0b18
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
description: updated
tags: added: xenial
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Steven Haber (sthaber) wrote :

I tested the most recent proposed kernel (4.4.0-138) using the same power faulting methodology as before. Everything looks good. I updated the tag.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (28.0 KiB)

This bug was fixed in the package linux - 4.4.0-138.164

---------------
linux (4.4.0-138.164) xenial; urgency=medium

  * linux: 4.4.0-138.164 -proposed tracker (LP: #1795582)

  * Linux 4.4.155 stable release build is broken on ppc64 (LP: #1795662)
    - powerpc/fadump: Return error when fadump registration fails

  * Kernel hang on drive pull caused by regression introduced by commit
    287922eb0b18 (LP: #1791790)
    - block: Fix a race between blk_cleanup_queue() and timeout handling

  * qeth: use vzalloc for QUERY OAT buffer (LP: #1793086)
    - s390/qeth: use vzalloc for QUERY OAT buffer

  * Page leaking in cachefiles_read_backing_file while vmscan is active
    (LP: #1793430)
    - SAUCE: cachefiles: Page leaking in cachefiles_read_backing_file while vmscan
      is active

  * Bugfix for handling of shadow doorbell buffer (LP: #1788222)
    - nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event

  * Xenial update to 4.4.155 stable release (LP: #1792419)
    - net: 6lowpan: fix reserved space for single frames
    - net: mac802154: tx: expand tailroom if necessary
    - 9p/net: Fix zero-copy path in the 9p virtio transport
    - net: lan78xx: Fix misplaced tasklet_schedule() call
    - spi: davinci: fix a NULL pointer dereference
    - drm/i915/userptr: reject zero user_size
    - powerpc/fadump: handle crash memory ranges array index overflow
    - powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
    - fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed
    - 9p/virtio: fix off-by-one error in sg list bounds check
    - net/9p/client.c: version pointer uninitialized
    - net/9p/trans_fd.c: fix race-condition by flushing workqueue before the
      kfree()
    - dm cache metadata: save in-core policy_hint_size to on-disk superblock
    - iio: ad9523: Fix displayed phase
    - iio: ad9523: Fix return value for ad952x_store()
    - vmw_balloon: fix inflation of 64-bit GFNs
    - vmw_balloon: do not use 2MB without batching
    - vmw_balloon: VMCI_DOORBELL_SET does not check status
    - vmw_balloon: fix VMCI use when balloon built into kernel
    - tracing: Do not call start/stop() functions when tracing_on does not change
    - tracing/blktrace: Fix to allow setting same value
    - kthread, tracing: Don't expose half-written comm when creating kthreads
    - uprobes: Use synchronize_rcu() not synchronize_sched()
    - 9p: fix multiple NULL-pointer-dereferences
    - PM / sleep: wakeup: Fix build error caused by missing SRCU support
    - pnfs/blocklayout: off by one in bl_map_stripe()
    - ARM: tegra: Fix Tegra30 Cardhu PCA954x reset
    - mm/tlb: Remove tlb_remove_table() non-concurrent condition
    - iommu/vt-d: Add definitions for PFSID
    - iommu/vt-d: Fix dev iotlb pfsid use
    - osf_getdomainname(): use copy_to_user()
    - sys: don't hold uts_sem while accessing userspace memory
    - userns: move user access out of the mutex
    - ubifs: Fix memory leak in lprobs self-check
    - Revert "UBIFS: Fix potential integer overflow in allocation"
    - ubifs: Check data node size before truncate
    - ubifs: Fix synced_i_size calculation for xattr inodes
    - pwm: ti...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.