xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()

Bug #1966803 reported by Kellen Renshaw
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Medium
Kellen Renshaw

Bug Description

SRU Justification:

[Impact]

* The xfs filesystem suffers from a deadlock issue in kernels < 5.5. This hangs IO to/from the affected filesystem. Sample backtraces added as a comment.

[Fix]

* 93597ae8dac0149b5c00b787cba6bf7ba213e666 93597ae8dac0 "xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()"

* This is from the upstream 5.5 kernel.

[Test Plan]

* Set up an Ubuntu Bionic/Focal installation using kernel 5.4.

* Create and mount an XFS filesystem on a block device.

* Exercise the filesystem to verify that IO does not hang.

[Where problems could occur]

* This patch could cause locking issues on XFS filesystems, requiring a system restart to correct.

[Other Info]

* Bug is difficult to reproduce, a test kernel on affected systems with the above patch prevented the issue.

* Backports to earlier (4.15 and earlier) kernels have been omitted as the upstream patch does not apply cleanly and the issue has not been reproduced on them.

tags: added: sts
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1966803

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Kellen Renshaw (krenshaw) wrote :
Download full text (8.1 KiB)

Example backtraces:
Mar 4 05:41:40 host kernel: [291932.968664] INFO: task tar:44877 blocked for more than 120 seconds.
Mar 4 05:41:40 host kernel: [291932.968826] Tainted: P OE 5.4.0-100-generic #113~18.04.1-Ubuntu
Mar 4 05:41:40 host kernel: [291932.969019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 4 05:41:40 host kernel: [291932.969222] tar D 0 44877 44865 0x00004000
Mar 4 05:41:40 host kernel: [291932.969226] Call Trace:
Mar 4 05:41:40 host kernel: [291932.969239] __schedule+0x292/0x720
Mar 4 05:41:40 host kernel: [291932.969315] ? xfs_buf_find.isra.34+0x205/0x640 [xfs]
Mar 4 05:41:40 host kernel: [291932.969321] schedule+0x33/0xa0
Mar 4 05:41:40 host kernel: [291932.969325] schedule_timeout+0x1d3/0x320
Mar 4 05:41:40 host kernel: [291932.969381] ? xfs_btree_update+0x7f/0xe0 [xfs]
Mar 4 05:41:40 host kernel: [291932.969444] ? xfs_buf_find.isra.34+0x205/0x640 [xfs]
Mar 4 05:41:40 host kernel: [291932.969448] __down+0x91/0xe0
Mar 4 05:41:40 host kernel: [291932.969454] ? page_counter_try_charge+0x10/0xd0
Mar 4 05:41:40 host kernel: [291932.969460] down+0x41/0x50
Mar 4 05:41:40 host kernel: [291932.969463] ? down+0x41/0x50
Mar 4 05:41:40 host kernel: [291932.969521] xfs_buf_lock+0x3c/0xf0 [xfs]
Mar 4 05:41:40 host kernel: [291932.969575] xfs_buf_find.isra.34+0x205/0x640 [xfs]
Mar 4 05:41:40 host kernel: [291932.969629] xfs_buf_get_map+0x43/0x2b0 [xfs]
Mar 4 05:41:40 host kernel: [291932.969684] xfs_buf_read_map+0x2c/0x1c0 [xfs]
Mar 4 05:41:40 host kernel: [291932.969757] xfs_trans_read_buf_map+0xd5/0x360 [xfs]
Mar 4 05:41:40 host kernel: [291932.969805] xfs_read_agf+0x92/0x120 [xfs]
Mar 4 05:41:40 host kernel: [291932.969851] xfs_alloc_read_agf+0x47/0x1c0 [xfs]
Mar 4 05:41:40 host kernel: [291932.969896] xfs_alloc_fix_freelist+0x2d2/0x500 [xfs]
Mar 4 05:41:40 host kernel: [291932.969899] ? down+0x2e/0x50
Mar 4 05:41:40 host kernel: [291932.969955] ? xfs_buf_find.isra.34+0x205/0x640 [xfs]
Mar 4 05:41:40 host kernel: [291932.969960] ? radix_tree_lookup+0xd/0x10
Mar 4 05:41:40 host kernel: [291932.970016] ? xfs_perag_get+0x2c/0xc0 [xfs]
Mar 4 05:41:40 host kernel: [291932.970061] xfs_alloc_vextent+0x334/0x590 [xfs]
Mar 4 05:41:40 host kernel: [291932.970115] xfs_ialloc_ag_alloc+0x17e/0x710 [xfs]
Mar 4 05:41:40 host kernel: [291932.970183] ? xfs_trans_read_buf_map+0x183/0x360 [xfs]
Mar 4 05:41:40 host kernel: [291932.970238] xfs_dialloc+0x139/0x280 [xfs]
Mar 4 05:41:40 host kernel: [291932.970302] xfs_ialloc+0x7c/0x520 [xfs]
Mar 4 05:41:40 host kernel: [291932.970364] xfs_dir_ialloc+0x62/0x1e0 [xfs]
Mar 4 05:41:40 host kernel: [291932.970422] xfs_create+0x3d9/0x570 [xfs]
Mar 4 05:41:40 host kernel: [291932.970479] xfs_generic_create+0x20e/0x2f0 [xfs]
Mar 4 05:41:40 host kernel: [291932.970534] xfs_vn_mknod+0x14/0x20 [xfs]
Mar 4 05:41:40 host kernel: [291932.970588] xfs_vn_create+0x13/0x20 [xfs]
Mar 4 05:41:40 host kernel: [291932.970593] path_openat+0x12cb/0x16a0
Mar 4 05:41:40 host kernel: [291932.970659] ? xfs_trans_get_bud+0x4/0x70 [xfs]
Mar 4 05:41:40 host kernel: [291932.970717] ? xfs_perag_get+0...

Read more...

description: updated
description: updated
Revision history for this message
Kellen Renshaw (krenshaw) wrote :

Unable to run the apport command.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
assignee: nobody → Kellen Renshaw (krenshaw)
importance: Undecided → Medium
status: New → In Progress
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.4.0-110.124 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Kellen Renshaw (krenshaw) wrote :

Verification for Focal

Ran the xfstests (https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git) on the current 5.4.0-109-generic #123-Ubuntu kernel and the -proposed kernel 5.4.0-110-generic #124-Ubuntu in a VM to exercise the XFS filesystem thoroughly. Detailed log files of the tests are attached.

The tests completed successfully on both kernels, with the -110 kernel failing 2 fewer tests than the -109 kernel.

109 (-updates) - Failed 76 of 373 tests
110 (-proposed) - Failed 74 of 373 tests

The tests indicate no regressions on the test failures.

Revision history for this message
Kellen Renshaw (krenshaw) wrote :
tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Kellen Renshaw (krenshaw) wrote :

Verification for Bionic

Ran the xfstests (https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git) on the current 5.4.0-109-generic #123~18.04.1-Ubuntu kernel and the -proposed kernel 5.4.0-110-generic #124~18.04.1-Ubuntu in a VM to exercise the XFS filesystem thoroughly. Detailed log files of the tests are attached.

The tests completed successfully on both kernels, with the -110 kernel showing no regressions relative to the -109 kernel.

Revision history for this message
Kellen Renshaw (krenshaw) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (11.1 KiB)

This bug was fixed in the package linux - 5.4.0-110.124

---------------
linux (5.4.0-110.124) focal; urgency=medium

  * focal/linux: 5.4.0-110.124 -proposed tracker (LP: #1969053)

  * net/mlx5e: Fix page DMA map/unmap attributes (LP: #1967292)
    - net/mlx5e: Fix page DMA map/unmap attributes

  * xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
    (LP: #1966803)
    - xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()

  * LRMv6: add multi-architecture support (LP: #1968774)
    - [Packaging] resync dkms-build{,--nvidia-N}

  * xfrm interface cannot be changed anymore (LP: #1968591)
    - xfrm: fix the if_id check in changelink

  * Use kernel-testing repo from launchpad for ADT tests (LP: #1968016)
    - [Debian] Use kernel-testing repo from launchpad

  * vmx_ldtr_test in ubuntu_kvm_unit_tests failed (FAIL: Expected 0 for L1 LDTR
    selector (got 50)) (LP: #1956315)
    - KVM: nVMX: Set LDTR to its architecturally defined value on nested VM-Exit

  * [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" which causes
    Bus Fatal Error when rebooting system with BCM5720 NIC (LP: #1917471)
    - Revert "PM: ACPI: reboot: Use S5 for reboot"

  * Focal update: v5.4.181 upstream stable release (LP: #1967582)
    - Makefile.extrawarn: Move -Wunaligned-access to W=1
    - HID:Add support for UGTABLET WP5540
    - Revert "svm: Add warning message for AVIC IPI invalid target"
    - serial: parisc: GSC: fix build when IOSAPIC is not set
    - parisc: Drop __init from map_pages declaration
    - parisc: Fix data TLB miss in sba_unmap_sg
    - parisc: Fix sglist access in ccio-dma.c
    - btrfs: send: in case of IO error log it
    - platform/x86: ISST: Fix possible circular locking dependency detected
    - selftests: rtc: Increase test timeout so that all tests run
    - net: ieee802154: at86rf230: Stop leaking skb's
    - selftests/zram: Skip max_comp_streams interface on newer kernel
    - selftests/zram01.sh: Fix compression ratio calculation
    - selftests/zram: Adapt the situation that /dev/zram0 is being used
    - ax25: improve the incomplete fix to avoid UAF and NPD bugs
    - vfs: make freeze_super abort when sync_filesystem returns error
    - quota: make dquot_quota_sync return errors from ->sync_fs
    - nvme: fix a possible use-after-free in controller reset during load
    - nvme-tcp: fix possible use-after-free in transport error_recovery work
    - nvme-rdma: fix possible use-after-free in transport error_recovery work
    - drm/amdgpu: fix logic inversion in check
    - Revert "module, async: async_synchronize_full() on module init iff async is
      used"
    - ftrace: add ftrace_init_nop()
    - module/ftrace: handle patchable-function-entry
    - arm64: module: rework special section handling
    - arm64: module/ftrace: intialize PLT at load time
    - iwlwifi: fix use-after-free
    - drm/radeon: Fix backlight control on iMac 12,1
    - ext4: check for out-of-order index extents in ext4_valid_extent_entries()
    - ext4: check for inconsistent extents between index and leaf block
    - ext4: prevent partial update of the extent blocks
    - taskstats: Cleanup t...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.