os-brick

Bug #1592520
Comment #1

Comment 1 for bug 1592520

Revision history for this message

Rodrigo Freire (rbs-j) wrote on 2016-06-17:

Ok, I have found a reproducer for this issue.

0. Setup a LVM volume in your system. In this case, it is a 4x1 TB PVs (LUNs 3EB, 3EA, 3E9 and 3E8), resulting in a single 4 TB LV.

1. Presented 3 20 GB (LUNs 038, 037 and 034) volumes to a tenant. From the compute perspective:
   [root@compute-0 ~]# multipath -ll
   (<redacted>00000038) dm-6 HITACHI ,DF600F
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:3 sdac 65:192 active ready running
   | `- 1:0:0:3 sdu 65:64 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:1:3 sdm 8:192 active ready running
     `- 4:0:0:3 sde 8:64 active ready running
   (<redacted>000003eb) dm-0 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:0:203 sdi 8:128 active ready running
   | `- 4:0:1:203 sdq 65:0 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:203 sdag 66:0 active ready running
     `- 1:0:0:203 sdy 65:128 active ready running
   (<redacted>000003ea) dm-2 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:0:202 sdx 65:112 active ready running
   | `- 1:0:1:202 sdaf 65:240 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:0:202 sdh 8:112 active ready running
     `- 4:0:1:202 sdp 8:240 active ready running
   (<redacted>000003e9) dm-4 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:1:201 sdo 8:224 active ready running
   | `- 4:0:0:201 sdg 8:96 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:201 sdae 65:224 active ready running
     `- 1:0:0:201 sdw 65:96 active ready running
   (<redacted>00000031) dm-5 HITACHI ,DF600F
   size=270G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:0:0 sdb 8:16 active ready running
   | `- 4:0:1:0 sdj 8:144 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:0 sdz 65:144 active ready running
     `- 1:0:0:0 sdr 65:16 active ready running
   (<redacted>000003e8) dm-1 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:200 sdad 65:208 active ready running
   | `- 1:0:0:200 sdv 65:80 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:0:200 sdf 8:80 active ready running
     `- 4:0:1:200 sdn 8:208 active ready running
   (<redacted>00000034) dm-7 HITACHI ,DF600F
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:1 sdaa 65:160 active ready running
   | `- 1:0:0:1 sds 65:32 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:1:1 sdk 8:160 active ready running
     `- 4:0:0:1 sdc 8:32 active ready running
   (<redacted>00000037) dm-3 HITACHI ,DF600F
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:0:2 sdd 8:48 active ready running
   | `- 4:0:1:2 sdl 8:176 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:2 sdab 65:176 active ready running
     `- 1:0:0:2 sdt 65:48 active ready running

2. Check for D-state processes:
   [root@compute-0 ~]# ps ax | grep D
      PID TTY STAT TIME COMMAND
     2069 ? Ssl 0:00 /usr/sbin/gssproxy -D
     4279 ? Ss 0:00 /usr/sbin/sshd -D
     9489 pts/0 S+ 0:00 grep --color=auto D

3. No D-state processes were found. Now, we will write data do a multipath device. 50 MBs. sync write (means: will exit only after comitted to the disk), to ensure the device is working correctly.
   [root@compute-0 ~]# dd if=/dev/zero of=/dev/mapper/mpatht bs=1M count=50 oflag=sync
   50+0 records in
   50+0 records out
   52428800 bytes (52 MB) copied, 0.122838 s, 427 MB/s

4. Success. Now will detach the LUNs

5. WHOOPS. pid 9576 (systemd-udevd) got stuck in D-state.

6. Listing the multipath devices again:
   [root@compute-0 ~]# multipath -ll
   (<redacted>00000038) dm-6 HITACHI ,DF600F
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:3 sdac 65:192 active ready running
   | `- 1:0:0:3 sdu 65:64 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:1:3 sdm 8:192 active ready running
     `- 4:0:0:3 sde 8:64 active ready running
   (<redacted>000003eb) dm-0 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:0:203 sdi 8:128 active ready running
   | `- 4:0:1:203 sdq 65:0 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:203 sdag 66:0 active ready running
     `- 1:0:0:203 sdy 65:128 active ready running
   (<redacted>000003ea) dm-2 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:0:202 sdx 65:112 active ready running
   | `- 1:0:1:202 sdaf 65:240 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:0:202 sdh 8:112 active ready running
     `- 4:0:1:202 sdp 8:240 active ready running
   (<redacted>000003e9) dm-4 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:1:201 sdo 8:224 active ready running
   | `- 4:0:0:201 sdg 8:96 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:201 sdae 65:224 active ready running
     `- 1:0:0:201 sdw 65:96 active ready running
   (<redacted>00000031) dm-5 HITACHI ,DF600F
   size=270G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:0:0 sdb 8:16 active ready running
   | `- 4:0:1:0 sdj 8:144 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:0 sdz 65:144 active ready running
     `- 1:0:0:0 sdr 65:16 active ready running
   (<redacted>000003e8) dm-1 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:200 sdad 65:208 active ready running
   | `- 1:0:0:200 sdv 65:80 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:0:200 sdf 8:80 active ready running
     `- 4:0:1:200 sdn 8:208 active ready running
   (<redacted>00000034) dm-7 HITACHI ,DF600F
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:1 sdaa 65:160 active ready running
   | `- 1:0:0:1 sds 65:32 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:1:1 sdk 8:160 active ready running
     `- 4:0:0:1 sdc 8:32 active ready running
   (<redacted>00000037) dm-3 <========= BOGUS
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw <========= BOGUS

7. BAD. LUN 037 got stuck with no child underlying devices.

7. Let's see who's holding /dev/dm-3 and preventing its removal:
[root@compute-0 ~]# lsof -n | grep dm-
systemd-u 9576 root 8r BLK 253,3 0x4ffff0000 40041 /dev/dm-3

8. It's udev. Print a thread dump, what is systemd-udev doing:
[root@compute-0 ~]# echo t > /proc/sysrq-trigger

[...]

   [ 143.410650] systemd-udevd D ffff88fee3e6d080 0 9576 1166 0x00000080
   [ 143.410652] ffff887ee4c8fb90 0000000000000086 ffff887ee57c2e00 ffff887ee4c8ffd8
   [ 143.410655] ffff887ee4c8ffd8 ffff887ee4c8ffd8 ffff887ee57c2e00 ffff88bf3fb94780
   [ 143.410659] 0000000000000000 7fffffffffffffff ffffffff81168950 ffff887ee4c8fcf0
   [ 143.410663] Call Trace:
   [ 143.410665] [<ffffffff81168950>] ? sleep_on_page+0x20/0x20
   [ 143.410667] [<ffffffff8163a9c9>] schedule+0x29/0x70
   [ 143.410669] [<ffffffff816386b9>] schedule_timeout+0x209/0x2d0
   [ 143.410672] [<ffffffff8101c829>] ? read_tsc+0x9/0x10
   [ 143.410674] [<ffffffff81168950>] ? sleep_on_page+0x20/0x20
   [ 143.410676] [<ffffffff81639ffe>] io_schedule_timeout+0xae/0x130
   [ 143.410679] [<ffffffff8163a098>] io_schedule+0x18/0x20
   [ 143.410681] [<ffffffff8116895e>] sleep_on_page_killable+0xe/0x40
   [ 143.410683] [<ffffffff8163897b>] __wait_on_bit_lock+0x5b/0xc0
   [ 143.410685] [<ffffffff81168af8>] __lock_page_killable+0x78/0xa0
   [ 143.410687] [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
   [ 143.410689] [<ffffffff8116ad6e>] generic_file_aio_read+0x50e/0x750
   [ 143.410694] [<ffffffff8121969c>] blkdev_aio_read+0x4c/0x70
   [ 143.410696] [<ffffffff811ddcad>] do_sync_read+0x8d/0xd0
   [ 143.410698] [<ffffffff811de40c>] vfs_read+0x9c/0x170
   [ 143.410700] [<ffffffff811def5f>] SyS_read+0x7f/0xe0
   [ 143.410702] [<ffffffff81645a49>] system_call_fastpath+0x16/0x1b

Another similar trace is seen in https://bugzilla.redhat.com/show_bug.cgi?id=1173739#c9.

Ok, I have found a reproducer for this issue.

0. Setup a LVM volume in your system. In this case, it is a 4x1 TB PVs (LUNs 3EB, 3EA, 3E9 and 3E8), resulting in a single 4 TB LV. 
   
1. Presented 3 20 GB (LUNs 038, 037 and 034) volumes to a tenant. From the compute perspective:
   [root@compute-0 ~]# multipath -ll
   (<redacted>00000038) dm-6 HITACHI ,DF600F
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:3   sdac 65:192 active ready running
   | `- 1:0:0:3   sdu  65:64  active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:1:3   sdm  8:192  active ready running
     `- 4:0:0:3   sde  8:64   active ready running
   (<redacted>000003eb) dm-0 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:0:203 sdi  8:128  active ready running
   | `- 4:0:1:203 sdq  65:0   active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:203 sdag 66:0   active ready running
     `- 1:0:0:203 sdy  65:128 active ready running
   (<redacted>000003ea) dm-2 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:0:202 sdx  65:112 active ready running
   | `- 1:0:1:202 sdaf 65:240 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:0:202 sdh  8:112  active ready running
     `- 4:0:1:202 sdp  8:240  active ready running
   (<redacted>000003e9) dm-4 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:1:201 sdo  8:224  active ready running
   | `- 4:0:0:201 sdg  8:96   active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:201 sdae 65:224 active ready running
     `- 1:0:0:201 sdw  65:96  active ready running
   (<redacted>00000031) dm-5 HITACHI ,DF600F
   size=270G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:0:0   sdb  8:16   active ready running
   | `- 4:0:1:0   sdj  8:144  active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:0   sdz  65:144 active ready running
     `- 1:0:0:0   sdr  65:16  active ready running
   (<redacted>000003e8) dm-1 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:200 sdad 65:208 active ready running
   | `- 1:0:0:200 sdv  65:80  active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:0:200 sdf  8:80   active ready running
     `- 4:0:1:200 sdn  8:208  active ready running
   (<redacted>00000034) dm-7 HITACHI ,DF600F
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:1   sdaa 65:160 active ready running
   | `- 1:0:0:1   sds  65:32  active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:1:1   sdk  8:160  active ready running
     `- 4:0:0:1   sdc  8:32   active ready running
   (<redacted>00000037) dm-3 HITACHI ,DF600F
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:0:2   sdd  8:48   active ready running
   | `- 4:0:1:2   sdl  8:176  active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:2   sdab 65:176 active ready running
     `- 1:0:0:2   sdt  65:48  active ready running

2. Check for D-state processes:
   [root@compute-0 ~]# ps ax | grep D
      PID TTY      STAT   TIME COMMAND
     2069 ?        Ssl    0:00 /usr/sbin/gssproxy -D
     4279 ?        Ss     0:00 /usr/sbin/sshd -D
     9489 pts/0    S+     0:00 grep --color=auto D

4. Success. Now will detach the LUNs

5. WHOOPS. pid 9576 (systemd-udevd) got stuck in D-state.

6. Listing the multipath devices again:
   [root@compute-0 ~]# multipath -ll
   (<redacted>00000038) dm-6 HITACHI ,DF600F
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:3   sdac 65:192 active ready running
   | `- 1:0:0:3   sdu  65:64  active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:1:3   sdm  8:192  active ready running
     `- 4:0:0:3   sde  8:64   active ready running
   (<redacted>000003eb) dm-0 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:0:203 sdi  8:128  active ready running
   | `- 4:0:1:203 sdq  65:0   active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:203 sdag 66:0   active ready running
     `- 1:0:0:203 sdy  65:128 active ready running
   (<redacted>000003ea) dm-2 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:0:202 sdx  65:112 active ready running
   | `- 1:0:1:202 sdaf 65:240 active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:0:202 sdh  8:112  active ready running
     `- 4:0:1:202 sdp  8:240  active ready running
   (<redacted>000003e9) dm-4 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:1:201 sdo  8:224  active ready running
   | `- 4:0:0:201 sdg  8:96   active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:201 sdae 65:224 active ready running
     `- 1:0:0:201 sdw  65:96  active ready running
   (<redacted>00000031) dm-5 HITACHI ,DF600F
   size=270G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 4:0:0:0   sdb  8:16   active ready running
   | `- 4:0:1:0   sdj  8:144  active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 1:0:1:0   sdz  65:144 active ready running
     `- 1:0:0:0   sdr  65:16  active ready running
   (<redacted>000003e8) dm-1 HITACHI ,DF600F
   size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:200 sdad 65:208 active ready running
   | `- 1:0:0:200 sdv  65:80  active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:0:200 sdf  8:80   active ready running
     `- 4:0:1:200 sdn  8:208  active ready running
   (<redacted>00000034) dm-7 HITACHI ,DF600F
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
   |-+- policy='round-robin 0' prio=1 status=active
   | |- 1:0:1:1   sdaa 65:160 active ready running
   | `- 1:0:0:1   sds  65:32  active ready running
   `-+- policy='round-robin 0' prio=0 status=enabled
     |- 4:0:1:1   sdk  8:160  active ready running
     `- 4:0:0:1   sdc  8:32   active ready running
   (<redacted>00000037) dm-3                                   <========= BOGUS
   size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw  <========= BOGUS

7. BAD. LUN 037 got stuck with no child underlying devices.

7. Let's see who's holding /dev/dm-3 and preventing its removal:
   [root@compute-0 ~]# lsof -n | grep dm-
   systemd-u 9576            root    8r      BLK              253,3 0x4ffff0000      40041 /dev/dm-3

8. It's udev. Print a thread dump, what is systemd-udev doing:
   [root@compute-0 ~]# echo t > /proc/sysrq-trigger
   
   [...]
   
   [  143.410650] systemd-udevd   D ffff88fee3e6d080     0  9576   1166 0x00000080
   [  143.410652]  ffff887ee4c8fb90 0000000000000086 ffff887ee57c2e00 ffff887ee4c8ffd8
   [  143.410655]  ffff887ee4c8ffd8 ffff887ee4c8ffd8 ffff887ee57c2e00 ffff88bf3fb94780
   [  143.410659]  0000000000000000 7fffffffffffffff ffffffff81168950 ffff887ee4c8fcf0
   [  143.410663] Call Trace:
   [  143.410665]  [<ffffffff81168950>] ? sleep_on_page+0x20/0x20
   [  143.410667]  [<ffffffff8163a9c9>] schedule+0x29/0x70
   [  143.410669]  [<ffffffff816386b9>] schedule_timeout+0x209/0x2d0
   [  143.410672]  [<ffffffff8101c829>] ? read_tsc+0x9/0x10
   [  143.410674]  [<ffffffff81168950>] ? sleep_on_page+0x20/0x20
   [  143.410676]  [<ffffffff81639ffe>] io_schedule_timeout+0xae/0x130
   [  143.410679]  [<ffffffff8163a098>] io_schedule+0x18/0x20
   [  143.410681]  [<ffffffff8116895e>] sleep_on_page_killable+0xe/0x40
   [  143.410683]  [<ffffffff8163897b>] __wait_on_bit_lock+0x5b/0xc0
   [  143.410685]  [<ffffffff81168af8>] __lock_page_killable+0x78/0xa0
   [  143.410687]  [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
   [  143.410689]  [<ffffffff8116ad6e>] generic_file_aio_read+0x50e/0x750
   [  143.410694]  [<ffffffff8121969c>] blkdev_aio_read+0x4c/0x70
   [  143.410696]  [<ffffffff811ddcad>] do_sync_read+0x8d/0xd0
   [  143.410698]  [<ffffffff811de40c>] vfs_read+0x9c/0x170
   [  143.410700]  [<ffffffff811def5f>] SyS_read+0x7f/0xe0
   [  143.410702]  [<ffffffff81645a49>] system_call_fastpath+0x16/0x1b

Another similar trace is seen in https://bugzilla.redhat.com/show_bug.cgi?id=1173739#c9.