Comment 12 for bug 1896350

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Jammy.

I created a new Jammy VM, and installed qemu-utils.

The kernel is 5.15.0-41-generic from -updates.

I ran my reproducer.sh script from the testcase, and within a minute, the nbd request got stuck, and we started seeing hung task timeout oops messages in dmesg:

Jul 20 04:56:20 jammy-nbd kernel: block nbd15: NBD_DISCONNECT
Jul 20 04:56:20 jammy-nbd kernel: block nbd15: Send disconnect failed -32
Jul 20 04:56:20 jammy-nbd sudo[5267]: pam_unix(sudo:session): session closed for user root
Jul 20 04:56:20 jammy-nbd sudo[5271]: ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/qemu-nbd --connect=/dev/nbd15 --cache=writeback --format=qcow2 foo.img
Jul 20 04:56:20 jammy-nbd sudo[5271]: pam_unix(sudo:session): session opened for user root(uid=0) by ubuntu(uid=1000)
Jul 20 04:56:20 jammy-nbd kernel: ldm_validate_partition_table(): Disk read failed.
Jul 20 04:56:20 jammy-nbd kernel: Dev nbd15: unable to read RDB block 0
Jul 20 04:56:20 jammy-nbd kernel: nbd15: unable to read partition table
Jul 20 04:56:51 jammy-nbd kernel: block nbd15: Possible stuck request 0000000064946bb4: control (read@524087296,65536B). Runtime 30 seconds
Jul 20 04:57:19 jammy-nbd systemd-udevd[440]: nbd15: Worker [2561] processing SEQNUM=3062 is taking a long time
Jul 20 04:57:21 jammy-nbd kernel: block nbd15: Possible stuck request 0000000064946bb4: control (read@524087296,65536B). Runtime 60 seconds
Jul 20 04:57:52 jammy-nbd kernel: block nbd15: Possible stuck request 0000000064946bb4: control (read@524087296,65536B). Runtime 90 seconds
Jul 20 04:58:23 jammy-nbd kernel: block nbd15: Possible stuck request 0000000064946bb4: control (read@524087296,65536B). Runtime 120 seconds
Jul 20 04:58:23 jammy-nbd kernel: INFO: task qemu-nbd:5280 blocked for more than 120 seconds.
Jul 20 04:58:23 jammy-nbd kernel: Not tainted 5.15.0-41-generic #44-Ubuntu
Jul 20 04:58:23 jammy-nbd kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 04:58:23 jammy-nbd kernel: task:qemu-nbd state:D stack: 0 pid: 5280 ppid: 1 flags:0x00000002
Jul 20 04:58:23 jammy-nbd kernel: Call Trace:
Jul 20 04:58:23 jammy-nbd kernel: <TASK>
Jul 20 04:58:23 jammy-nbd kernel: __schedule+0x23d/0x590
Jul 20 04:58:23 jammy-nbd kernel: ? call_rcu+0xe/0x10
Jul 20 04:58:23 jammy-nbd kernel: schedule+0x4e/0xb0
Jul 20 04:58:23 jammy-nbd kernel: blk_mq_freeze_queue_wait+0x69/0xa0
Jul 20 04:58:23 jammy-nbd kernel: ? wait_woken+0x70/0x70
Jul 20 04:58:23 jammy-nbd kernel: blk_mq_freeze_queue+0x1b/0x30
Jul 20 04:58:23 jammy-nbd kernel: nbd_add_socket+0x76/0x1f0 [nbd]
Jul 20 04:58:23 jammy-nbd kernel: __nbd_ioctl+0x18b/0x340 [nbd]
Jul 20 04:58:23 jammy-nbd kernel: ? security_capable+0x3d/0x60
Jul 20 04:58:23 jammy-nbd kernel: nbd_ioctl+0x81/0xb0 [nbd]
Jul 20 04:58:23 jammy-nbd kernel: blkdev_ioctl+0x12e/0x270
Jul 20 04:58:23 jammy-nbd kernel: ? __fget_files+0x86/0xc0
Jul 20 04:58:23 jammy-nbd kernel: block_ioctl+0x46/0x50
Jul 20 04:58:23 jammy-nbd kernel: __x64_sys_ioctl+0x91/0xc0
Jul 20 04:58:23 jammy-nbd kernel: do_syscall_64+0x5c/0xc0
Jul 20 04:58:23 jammy-nbd kernel: ? exit_to_user_mode_prepare+0x37/0xb0
Jul 20 04:58:23 jammy-nbd kernel: ? syscall_exit_to_user_mode+0x27/0x50
Jul 20 04:58:23 jammy-nbd kernel: ? __x64_sys_recvmsg+0x1d/0x20
Jul 20 04:58:23 jammy-nbd kernel: ? do_syscall_64+0x69/0xc0
Jul 20 04:58:23 jammy-nbd kernel: ? syscall_exit_to_user_mode+0x27/0x50
Jul 20 04:58:23 jammy-nbd kernel: ? __x64_sys_recvmsg+0x1d/0x20
Jul 20 04:58:23 jammy-nbd kernel: ? do_syscall_64+0x69/0xc0
Jul 20 04:58:23 jammy-nbd kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Jul 20 04:58:23 jammy-nbd kernel: RIP: 0033:0x7f6c47e47aff
Jul 20 04:58:23 jammy-nbd kernel: RSP: 002b:00007f6c464d1820 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jul 20 04:58:23 jammy-nbd kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f6c47e47aff
Jul 20 04:58:23 jammy-nbd kernel: RDX: 0000000000000009 RSI: 000000000000ab00 RDI: 000000000000000b
Jul 20 04:58:23 jammy-nbd kernel: RBP: 00007f6c464d1910 R08: 0000000000000000 R09: 0000000000000001
Jul 20 04:58:23 jammy-nbd kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000b
Jul 20 04:58:23 jammy-nbd kernel: R13: 00007f6c464d1900 R14: 000000001f400000 R15: 00007f6c3c000b90
Jul 20 04:58:23 jammy-nbd kernel: </TASK>

I then rebooted, and enabled -proposed and installed the 5.15.0-43-generic kernel.

I started the reproducer.sh script and left it to run for an hour.

At the end of the hour, the script was still running strong. Requests no longer get stuck when we issue NBD_DISCONNECT, and the issue is solved.

The kernel in -proposed fixes the issue, happy to mark verified.