Comment 14 for bug 1896350

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Focal.

I started a fresh Focal VM, and installed qemu-utils. I then ran reproducer.sh from the testcase section.

The kernel is 5.4.0-124-generic from -updates.

Within 30 seconds of starting the reproducer, the testcase script hung, and the following was in dmesg:

Aug 23 04:49:26 focal-nbd kernel: block nbd15: NBD_DISCONNECT
Aug 23 04:49:26 focal-nbd kernel: block nbd15: Send disconnect failed -32
Aug 23 04:49:26 focal-nbd sudo[1804]: pam_unix(sudo:session): session closed for user root
Aug 23 04:49:26 focal-nbd sudo[1807]: ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/qemu-nbd --connect=/dev/nbd15 --cache=writeback --format=qcow2 foo.img
Aug 23 04:49:26 focal-nbd sudo[1807]: pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)
Aug 23 04:49:26 focal-nbd kernel: ldm_validate_partition_table(): Disk read failed.
Aug 23 04:49:26 focal-nbd kernel: Dev nbd15: unable to read RDB block 0
Aug 23 04:49:26 focal-nbd kernel: nbd15: unable to read partition table
Aug 23 04:49:56 focal-nbd kernel: block nbd15: Possible stuck request 000000004d5cc344: control (read@523988992,36864B). Runtime 30 seconds
Aug 23 04:50:26 focal-nbd systemd-udevd[419]: nbd15: Worker [1198] processing SEQNUM=5582 is taking a long time
Aug 23 04:50:27 focal-nbd kernel: block nbd15: Possible stuck request 000000004d5cc344: control (read@523988992,36864B). Runtime 60 seconds
Aug 23 04:50:58 focal-nbd kernel: block nbd15: Possible stuck request 000000004d5cc344: control (read@523988992,36864B). Runtime 90 seconds
Aug 23 04:51:29 focal-nbd kernel: block nbd15: Possible stuck request 000000004d5cc344: control (read@523988992,36864B). Runtime 120 seconds
Aug 23 04:51:59 focal-nbd kernel: block nbd15: Possible stuck request 000000004d5cc344: control (read@523988992,36864B). Runtime 150 seconds
Aug 23 04:52:26 focal-nbd systemd-udevd[419]: nbd15: Worker [1198] processing SEQNUM=5582 killed
Aug 23 04:52:30 focal-nbd kernel: block nbd15: Possible stuck request 000000004d5cc344: control (read@523988992,36864B). Runtime 180 seconds
Aug 23 04:53:27 focal-nbd kernel: INFO: task qemu-nbd:1815 blocked for more than 120 seconds.
Aug 23 04:53:27 focal-nbd kernel: Not tainted 5.4.0-124-generic #140-Ubuntu
Aug 23 04:53:27 focal-nbd kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 23 04:53:27 focal-nbd kernel: qemu-nbd D 0 1815 1 0x00000000
Aug 23 04:53:27 focal-nbd kernel: Call Trace:
Aug 23 04:53:27 focal-nbd kernel: __schedule+0x2e3/0x740
Aug 23 04:53:27 focal-nbd kernel: ? __kfifo_to_user_r+0xa0/0xa0
Aug 23 04:53:27 focal-nbd kernel: schedule+0x42/0xb0
Aug 23 04:53:27 focal-nbd kernel: blk_mq_freeze_queue_wait+0x4b/0xb0
Aug 23 04:53:27 focal-nbd kernel: ? __wake_up_pollfree+0x40/0x40
Aug 23 04:53:27 focal-nbd kernel: blk_mq_freeze_queue+0x1b/0x20
Aug 23 04:53:27 focal-nbd kernel: nbd_add_socket+0x5e/0x1d0 [nbd]
Aug 23 04:53:27 focal-nbd kernel: nbd_ioctl+0x2f7/0x410 [nbd]
Aug 23 04:53:27 focal-nbd kernel: blkdev_ioctl+0x383/0xa30
Aug 23 04:53:27 focal-nbd kernel: block_ioctl+0x3d/0x50
Aug 23 04:53:27 focal-nbd kernel: do_vfs_ioctl+0x407/0x670
Aug 23 04:53:27 focal-nbd kernel: ? putname+0x4a/0x50
Aug 23 04:53:27 focal-nbd kernel: ksys_ioctl+0x67/0x90
Aug 23 04:53:27 focal-nbd kernel: __x64_sys_ioctl+0x1a/0x20
Aug 23 04:53:27 focal-nbd kernel: do_syscall_64+0x57/0x190
Aug 23 04:53:27 focal-nbd kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 23 04:53:27 focal-nbd kernel: RIP: 0033:0x7fd12b7573ab
Aug 23 04:53:27 focal-nbd kernel: Code: Bad RIP value.
Aug 23 04:53:27 focal-nbd kernel: RSP: 002b:00007fd129fa2a18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 23 04:53:27 focal-nbd kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd12b7573ab
Aug 23 04:53:27 focal-nbd kernel: RDX: 000000000000000b RSI: 000000000000ab00 RDI: 000000000000000d
Aug 23 04:53:27 focal-nbd kernel: RBP: 00007fd129fa2aa8 R08: 0000000000000000 R09: 0000000000000001
Aug 23 04:53:27 focal-nbd kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd129fa2ab0
Aug 23 04:53:27 focal-nbd kernel: R13: 000000000000000d R14: 000000001f400000 R15: 00007fd120000b80

I then rebooted, enabled -proposed and installed kernel 5.4.0-125-generic.

I left the reproducer.sh script running for a bit over an hour, and it was still running perfectly fine when I got back to it. Requests are still moving smoothly, and no longer getting stuck.

The 5.4.0-125-generic kernel in -proposed fixes the issue. Happy to mark it verified.