kernel BUG: io_uring openat triggers audit reference count underflow

Bug #2043841 reported by Dan Clash
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Lunar
Fix Committed
Medium
Tim Gardner
Mantic
Fix Released
Medium
Tim Gardner

Bug Description

I first encountered a bug in 6.2.0-1012-azure #12~22.04.1-Ubuntu that occurs during io_uring openat audit processing. I have a kernel patch that was accepted into the upstream kernel as well as the v6.6, v6.5.9, and v6.1.60 releases. The bug was first introduced in the upstream v5.16 kernel.

I do not see the change yet in:

* The Ubuntu-azure-6.2-6.2.0-1017.17_22.04.1 tag in the jammy kernel repository.
* The Ubuntu-azure-6.5.0-1009.9 tag in the mantic kernel repository.

Can this upstream commit be cherry picked?

The upstream commit is:

03adc61edad49e1bbecfb53f7ea5d78f398fe368

The upstream patch thread is:

https://<email address hidden>/T/#u

The maintainer pull request thread is:

https://lore.kernel.org/lkml/20231019-kampfsport-metapher-e5211d7be247@brauner

The pre-patch discussion thread is:

https://lore<email address hidden>/T/#u

The commit log message is:

commit 03adc61edad49e1bbecfb53f7ea5d78f398fe368
Author: Dan Clash <email address hidden>
Date: Thu Oct 12 14:55:18 2023 -0700

    audit,io_uring: io_uring openat triggers audit reference count underflow

    An io_uring openat operation can update an audit reference count
    from multiple threads resulting in the call trace below.

    A call to io_uring_submit() with a single openat op with a flag of
    IOSQE_ASYNC results in the following reference count updates.

    These first part of the system call performs two increments that do not race.

    do_syscall_64()
      __do_sys_io_uring_enter()
        io_submit_sqes()
          io_openat_prep()
            __io_openat_prep()
              getname()
                getname_flags() /* update 1 (increment) */
                  __audit_getname() /* update 2 (increment) */

    The openat op is queued to an io_uring worker thread which starts the
    opportunity for a race. The system call exit performs one decrement.

    do_syscall_64()
      syscall_exit_to_user_mode()
        syscall_exit_to_user_mode_prepare()
          __audit_syscall_exit()
            audit_reset_context()
               putname() /* update 3 (decrement) */

    The io_uring worker thread performs one increment and two decrements.
    These updates can race with the system call decrement.

    io_wqe_worker()
      io_worker_handle_work()
        io_wq_submit_work()
          io_issue_sqe()
            io_openat()
              io_openat2()
                do_filp_open()
                  path_openat()
                    __audit_inode() /* update 4 (increment) */
                putname() /* update 5 (decrement) */
            __audit_uring_exit()
              audit_reset_context()
                putname() /* update 6 (decrement) */

    The fix is to change the refcnt member of struct audit_names
    from int to atomic_t.

    kernel BUG at fs/namei.c:262!
    Call Trace:
    ...
     ? putname+0x68/0x70
     audit_reset_context.part.0.constprop.0+0xe1/0x300
     __audit_uring_exit+0xda/0x1c0
     io_issue_sqe+0x1f3/0x450
     ? lock_timer_base+0x3b/0xd0
     io_wq_submit_work+0x8d/0x2b0
     ? __try_to_del_timer_sync+0x67/0xa0
     io_worker_handle_work+0x17c/0x2b0
     io_wqe_worker+0x10a/0x350

    Cc: <email address hidden>
    Link: https://<email address hidden>/
    Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
    Signed-off-by: Dan Clash <email address hidden>
    Link: https://<email address hidden>
    Reviewed-by: Jens Axboe <email address hidden>
    Signed-off-by: Christian Brauner <email address hidden>

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Libera.chat.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/2043841/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
Dan Clash (daclash) wrote :

This bug is a in the Linux kernel, specifically in the filesystem / io_uring / audit areas.

affects: ubuntu → linux (Ubuntu)
Paul White (paulw2u)
affects: linux (Ubuntu) → linux-azure-6.2 (Ubuntu)
tags: added: jammy
Tim Gardner (timg-tpi)
affects: linux-azure-6.2 (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Lunar):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
importance: Undecided → Medium
Changed in linux (Ubuntu Mantic):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → Medium
status: New → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Mantic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Lunar):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.2.0-41.42 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar-linux' to 'verification-done-lunar-linux'. If the problem still exists, change the tag 'verification-needed-lunar-linux' to 'verification-failed-lunar-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux-v2 verification-needed-lunar-linux
Revision history for this message
Dan Clash (daclash) wrote :
Download full text (4.0 KiB)

The pre-patch discussion thread has a test program that I used to reproduce the issue.
The test program never completes if the bug is present.

I have not been through this process yet. Is it appropriate for me to do the testing? If yes then is there a document or steps that describes the appropriate way to test?

https://lore<email address hidden>/T/#u

The following is a copy of the test program:

Test program usage:

./io_uring_open_close_audit_hang --directory /tmp/deleteme --count 10000

Test program source:

// Note: The test program is C++ but could be converted to C.
#include <cassert>
#include <fcntl.h>
#include <filesystem>
#include <getopt.h>
#include <iostream>
#include <liburing.h>

// open and close a file. the file is created if it does not exist.

void
openClose(struct io_uring& ring, std::string fileName)
{
    int ret;
    struct io_uring_cqe* cqe {};
    struct io_uring_sqe* sqe {};
    int fd {};
    int flags {O_RDWR | O_CREAT};
    mode_t mode {0666};

    // openat2

    sqe = io_uring_get_sqe(&ring);
    assert(sqe != nullptr);

    io_uring_prep_openat(sqe, AT_FDCWD, fileName.data(), flags, mode);
    io_uring_sqe_set_flags(sqe, IOSQE_ASYNC);

    ret = io_uring_submit(&ring);
    assert(ret == 1);

    ret = io_uring_wait_cqe(&ring, &cqe);
    assert(ret == 0);

    fd = cqe->res;
    assert(fd > 0);

    io_uring_cqe_seen(&ring, cqe);

    // close

    sqe = io_uring_get_sqe(&ring);
    assert(sqe != nullptr);

    io_uring_prep_close(sqe, fd);
    io_uring_sqe_set_flags(sqe, IOSQE_ASYNC);

    ret = io_uring_submit(&ring);
    assert(ret == 1);

    // wait for the close to complete.
    ret = io_uring_wait_cqe(&ring, &cqe);
    assert(ret == 0);

    // verify that close succeeded.
    assert(cqe->res == 0);

    io_uring_cqe_seen(&ring, cqe);
}

// create 100 files and then open each file twice.

void
openCloseHang(std::string filePath)
{
    int ret;
    struct io_uring ring;

    ret = io_uring_queue_init(8, &ring, 0);
    assert(0 == ret);

    int repeat {3};
    int numFiles {100};

    std::filesystem::create_directory(filePath);

    // files of length 0 are created in the j==0 iteration below.
    // those files are opened and closed during the j>0 iteraions.
    // a repeat of 3 results in a fairly reliable reproduction.

    for (int j = 0; j < repeat; j += 1) {
        for (int i = 0; i < numFiles; i += 1) {
            std::string fileName(filePath + "/file" + std::to_string(i));
            openClose(ring, fileName);
        }
    }

    std::filesystem::remove_all(filePath);

    io_uring_queue_exit(&ring);
}

int
main(int argc, char** argv)
{
    std::string filePath {};
    int iterations {};

    struct option options[]
    {
        {"help", no_argument, 0, 'h'}, {"directory", required_argument, 0, 'd'},
            {"count", required_argument, 0, 'c'},
        {
            0, 0, 0, 0
        }
    };
    bool printUsage {false};
    int val {};

    while ((val = getopt_long_only(argc, argv, "", options, nullptr)) != -1) {
        if (val == 'h') {
            printUsage = true;
        } else if (val == 'd') ...

Read more...

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.5.0-16.16 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux' to 'verification-done-mantic-linux'. If the problem still exists, change the tag 'verification-needed-mantic-linux' to 'verification-failed-mantic-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-v2 verification-needed-mantic-linux
Revision history for this message
Roxana Nicolescu (roxanan) wrote :

Dan Clash, apologize for the late reply. Next time feel free to test it since you know the details better than anyone. Just use the latest version in proposed.
But providing the test info was really useful as I managed to test it without spending time on it, so thanks for that :)

Revision history for this message
Roxana Nicolescu (roxanan) wrote :

Ran ./io_uring_open_close_audit_hang --directory /tmp/deleteme --count 10000 on 6.5.0-17-generic
and it finished

tags: added: verification-done-lunar-linux verification-done-mantic-linux
removed: verification-needed-lunar-linux verification-needed-mantic-linux
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (86.2 KiB)

This bug was fixed in the package linux - 6.5.0-17.17

---------------
linux (6.5.0-17.17) mantic; urgency=medium

  * mantic/linux: 6.5.0-17.17 -proposed tracker (LP: #2049026)

  * [UBUNTU 23.04] Regression: Ubuntu 23.04/23.10 do not include uvdevice
    anymore (LP: #2048919)
    - [Config] Enable S390_UV_UAPI (built-in)

linux (6.5.0-16.16) mantic; urgency=medium

  * mantic/linux: 6.5.0-16.16 -proposed tracker (LP: #2048372)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync git-ubuntu-log
    - [Packaging] resync update-dkms-versions helper
    - [Packaging] remove helper scripts
    - [Packaging] update annotations scripts
    - debian/dkms-versions -- update from kernel-versions (main/2024.01.08)

  * Add missing RPL P/U CPU IDs (LP: #2047398)
    - drm/i915/rpl: Update pci ids for RPL P/U

  * Fix BCM57416 lost after resume (LP: #2047518)
    - bnxt_en: Clear resource reservation during resume

  * Hotplugging SCSI disk in QEMU VM fails (LP: #2047382)
    - Revert "PCI: acpiphp: Reassign resources on bridge if necessary"

  * Update bnxt_en with bug fixes and support for Broadcom 5760X network
    adapters (LP: #2045796)
    - bnxt_en: use dev_consume_skb_any() in bnxt_tx_int
    - eth: bnxt: move and rename reset helpers
    - eth: bnxt: take the bit to set as argument of bnxt_queue_sp_work()
    - eth: bnxt: handle invalid Tx completions more gracefully
    - eth: bnxt: fix one of the W=1 warnings about fortified memcpy()
    - eth: bnxt: fix warning for define in struct_group
    - bnxt_en: Fix W=1 warning in bnxt_dcb.c from fortify memcpy()
    - bnxt_en: Fix W=stringop-overflow warning in bnxt_dcb.c
    - bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP
    - bnxt_en: Let the page pool manage the DMA mapping
    - bnxt_en: Increment rx_resets counter in bnxt_disable_napi()
    - bnxt_en: Save ring error counters across reset
    - bnxt_en: Display the ring error counters under ethtool -S
    - bnxt_en: Add tx_resets ring counter
    - bnxt: use the NAPI skb allocation cache
    - bnxt_en: Update firmware interface to 1.10.2.171
    - bnxt_en: Enhance hwmon temperature reporting
    - bnxt_en: Move hwmon functions into a dedicated file
    - bnxt_en: Modify the driver to use hwmon_device_register_with_info
    - bnxt_en: Expose threshold temperatures through hwmon
    - bnxt_en: Use non-standard attribute to expose shutdown temperature
    - bnxt_en: Event handler for Thermal event
    - bnxt_en: Support QOS and TPID settings for the SRIOV VLAN
    - bnxt_en: Update VNIC resource calculation for VFs
    - Revert "bnxt_en: Support QOS and TPID settings for the SRIOV VLAN"
    - eth: bnxt: fix backward compatibility with older devices
    - bnxt_en: Do not call sleeping hwmon_notify_event() from NAPI
    - bnxt_en: Fix invoking hwmon_notify_event
    - bnxt_en: add infrastructure to lookup ethtool link mode
    - bnxt_en: support lane configuration via ethtool
    - bnxt_en: refactor speed independent ethtool modes
    - bnxt_en: Refactor NRZ/PAM4 link speed related logic
    - bnxt_en: convert to linkmode_set_bit() API
    - bnxt_en: extend media types to supported and autoneg modes
    - bnxt_en: Fix 2...

Changed in linux (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gcp-6.5/6.5.0-1013.13~22.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-gcp-6.5' to 'verification-done-jammy-linux-gcp-6.5'. If the problem still exists, change the tag 'verification-needed-jammy-linux-gcp-6.5' to 'verification-failed-jammy-linux-gcp-6.5'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-gcp-6.5-v2 verification-needed-jammy-linux-gcp-6.5
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/6.5.0-1013.13 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux-azure' to 'verification-done-mantic-linux-azure'. If the problem still exists, change the tag 'verification-needed-mantic-linux-azure' to 'verification-failed-mantic-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-azure-v2 verification-needed-mantic-linux-azure
Revision history for this message
Dan Clash (daclash) wrote :

I previously verified that the test program hangs when 6.5.0-1011-azure is installed.
I have been testing with 6.5.0-1012-azure from the Canonical Kernel PPA for a while with no issues.
I upgraded to 6.5.0-1013-azure just now and the test program still passes.

devvm7 ~ $ uname -a
Linux daclashlinux7 6.5.0-1013-azure #13~22.04.1-Ubuntu SMP Tue Feb 6 20:34:09 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

devvm7 ~ $ sudo dmesg --clear

devvm7 ~ $ ./io_uring_open_close_audit_hang --directory /tmp/deleteme --count 10000
i=0
i=100
i=200
...
i=9800
i=9900

devvm7 ~ $ sudo dmesg
devvm7 ~ $

The test program does not hang when running with 6.5.0-1012-azure.

daclash@daclashlinux4:~$ uname -a
Linux daclashlinux4 6.5.0-1012-azure #12~22.04.1-Ubuntu SMP Tue Jan 16 21:24:44 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

daclash@daclashlinux4:~$ sudo dmesg --clear

daclash@daclashlinux4:~$ ./io_uring_open_close_audit_hang --directory /tmp/deleteme --count 10000
...
i=9900

daclash@daclashlinux4:~$ sudo dmesg
daclash@daclashlinux4:~$

The test program does hang when running with 6.5.0-1011-azure.

daclash@daclashlinux4:~$ uname -a
Linux daclashlinux4 6.5.0-1011-azure #11~22.04.1-Ubuntu SMP Mon Jan 15 16:59:12 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

daclash@daclashlinux4:~$ sudo dmesg --clear
daclash@daclashlinux4:~$ ./io_uring_open_close_audit_hang --directory /tmp/deleteme --count 10000
i=0
...
i=5900
i=6000
^C

daclash@daclashlinux4:~$ sudo dmesg | grep "kernel BUG at fs/namei.c"
[ 125.159601] kernel BUG at fs/namei.c:264!

Revision history for this message
Dan Clash (daclash) wrote :

Please let me know if testing from the Canonical Kernel PPA is sufficient or if I should test again using -proposed.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws-6.5/6.5.0-1013.13~22.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-aws-6.5' to 'verification-done-jammy-linux-aws-6.5'. If the problem still exists, change the tag 'verification-needed-jammy-linux-aws-6.5' to 'verification-failed-jammy-linux-aws-6.5'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-aws-6.5-v2 verification-needed-jammy-linux-aws-6.5
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-6.5/6.5.0-1014.14 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-nvidia-6.5' to 'verification-done-jammy-linux-nvidia-6.5'. If the problem still exists, change the tag 'verification-needed-jammy-linux-nvidia-6.5' to 'verification-failed-jammy-linux-nvidia-6.5'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-6.5-v2 verification-needed-jammy-linux-nvidia-6.5
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.