sysfs test in ubuntu_stress_smoke_test will hang on a X-HWE lowlatency node

Bug #1791246 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Unassigned
stress-ng (Ubuntu)
Fix Released
High
Colin Ian King

Bug Description

ubuntu@pepe:~$ uname -a
Linux pepe 4.15.0-34-lowlatency #37~16.04.1-Ubuntu SMP PREEMPT Tue Aug 28 11:33:58 UTC 2018 i686 i686 i686 GNU/Linux

The sysfs test in ubuntu_stress_smoke_test will hang on this node.
Sep 7 06:50:01 pepe stress-ng: invoked with './stress-n' by user 0
Sep 7 06:50:01 pepe stress-ng: system: 'pepe' Linux 4.15.0-34-lowlatency #37~16.04.1-Ubuntu SMP PREEMPT Tue Aug 28 11:33:58 UTC 2018 i686
Sep 7 06:50:01 pepe stress-ng: memory (MB): total 3989.54, free 3595.64, shared 14.70, buffer 92.63, swap 1023.99, free swap 940.58
Sep 7 06:50:01 pepe stress-ng: info: [27744] dispatching hogs: 4 sysinfo
Sep 7 06:50:06 pepe stress-ng: info: [27744] successful run completed in 5.01s
Sep 7 06:50:06 pepe stress-ng: invoked with './stress-n' by user 0
Sep 7 06:50:06 pepe stress-ng: system: 'pepe' Linux 4.15.0-34-lowlatency #37~16.04.1-Ubuntu SMP PREEMPT Tue Aug 28 11:33:58 UTC 2018 i686
Sep 7 06:50:06 pepe stress-ng: memory (MB): total 3989.54, free 3594.25, shared 14.70, buffer 92.64, swap 1023.99, free swap 940.58
Sep 7 06:50:06 pepe stress-ng: info: [27763] dispatching hogs: 4 sysfs
Sep 7 06:50:09 pepe kernel: [ 1014.787741] WARNING! power/level is deprecated; use power/control instead

There are many repetitive message in dmesg right after triggering the test with:
$ ./stress-ng -t 5 --sysfs 4 --ignite-cpu --syslog --verbose --verify --oomable
[ 1655.642421] mpt2sas_cm0: _ctl_BRM_status_show: BRM attribute is only for warpdrive
[ 1655.642429] mpt2sas_cm0: _ctl_BRM_status_show: BRM attribute is only for warpdrive
[ 1655.642440] mpt2sas_cm0: _ctl_BRM_status_show: BRM attribute is only for warpdrive
[ 1655.642454] mpt2sas_cm0: _ctl_BRM_status_show: BRM attribute is only for warpdrive
[ 1655.642463] mpt2sas_cm0: _ctl_BRM_status_show: BRM attribute is only for warpdrive

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: stress-ng (not installed)
ProcVersionSignature: User Name 4.15.0-34.37~16.04.1-lowlatency 4.15.18
Uname: Linux 4.15.0-34-lowlatency i686
ApportVersion: 2.20.1-0ubuntu2.18
Architecture: i386
Date: Fri Sep 7 06:57:16 2018
SourcePackage: stress-ng
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

Hangs when accessing:

/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/version_product
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/prot_guard_type
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/ioc_reset_count
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/version_bios
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/sg_tablesize
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/device_delay
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/board_tracer
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/host_sas_address
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/diag_trigger_mpi
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/logging_level
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/fwfault_debug
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/prot_capabilities
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/unchecked_isa_dma
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/reply_queue_count
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/sg_prot_tablesize
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/host_trace_buffer

..so probably the host_trace_buffer is the problematic sysfs file.

Revision history for this message
Colin Ian King (colin-king) wrote :

And lots of spamming of the kernel log with:

[ 456.583067] mpt2sas_cm0: _ctl_host_trace_buffer_show: host_trace_buffer is not registered
[ 456.583078] mpt2sas_cm0: _ctl_host_trace_buffer_show: host_trace_buffer is not registered
[ 456.583100] mpt2sas_cm0: _ctl_host_trace_buffer_show: host_trace_buffer is not registered
[ 456.583121] mpt2sas_cm0: _ctl_host_trace_buffer_show: host_trace_buffer is not registered
[ 456.583138] mpt2sas_cm0: _ctl_host_trace_buffer_show: host_trace_buffer is not registered
[ 456.583155] mpt2sas_cm0: _ctl_host_trace_buffer_show: host_trace_buffer is not registered
[ 456.583172] mpt2sas_cm0: _ctl_host_trace_buffer_show: host_trace_buffer is not registered

Revision history for this message
Colin Ian King (colin-king) wrote :

So it appears that the mpt2sas driver is heavily spamming kernel messages and this causes a massive backlog. The low-latency kernel trying to log these to the backing store and this is causing more load.

I've changed the messages to be pr_err_ratelimited and the issue goes away. I'll send a fix for the driver upstream. Meanwhile, since this is triggered only by hard hammering of the sysfs entries I think I'll add a autodetection of kernel log spamming in the stressor and flag this automatically.

Revision history for this message
Colin Ian King (colin-king) wrote :

Worked around the above issue, now hitting the underlying issue:

[ 205.914807] CPU: 2 PID: 2685 Comm: stress-ng-sysfs Not tainted 4.15.0-34-lowlatency #37~16.04.1-Ubuntu
[ 205.914809] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.8.2 08/17/2011
[ 205.914811] Call Trace:
[ 205.914817] dump_stack+0x58/0x81
[ 205.914823] dump_header+0x5a/0x251
[ 205.914829] ? has_ns_capability_noaudit+0x2c/0x40
[ 205.914835] ? ___ratelimit+0x9f/0xf0
[ 205.914839] oom_kill_process+0x227/0x420
[ 205.914842] out_of_memory+0xe1/0x2b0
[ 205.914846] __alloc_pages_slowpath+0xbdc/0xca0
[ 205.914850] __alloc_pages_nodemask+0x249/0x280
[ 205.914855] new_slab+0x4da/0x620
[ 205.914859] ___slab_alloc.constprop.76+0x4c0/0x580
[ 205.914863] ? seq_open+0x2d/0x80
[ 205.914867] ? kernfs_fop_open+0xbc/0x390
[ 205.914872] ? try_to_wake_up+0x45/0x460
[ 205.914878] ? mutex_spin_on_owner+0xa5/0xc0
[ 205.914882] __slab_alloc.isra.69.constprop.75+0x30/0x50
[ 205.914885] ? __slab_alloc.isra.69.constprop.75+0x30/0x50
[ 205.914889] kmem_cache_alloc_trace+0x17f/0x1c0
[ 205.914892] ? seq_open+0x2d/0x80
[ 205.914896] seq_open+0x2d/0x80
[ 205.914899] kernfs_fop_open+0x18d/0x390
[ 205.914905] do_dentry_open+0x1ca/0x2f0
[ 205.914908] ? kernfs_seq_start+0x90/0x90
[ 205.914911] vfs_open+0x41/0x70
[ 205.914916] path_openat+0x250/0x1240
[ 205.914919] ? getname_flags+0x3a/0x1a0
[ 205.914924] ? aa_file_perm+0x86/0x350
[ 205.914927] do_filp_open+0x68/0xe0
[ 205.914930] ? aa_file_perm+0x86/0x350
[ 205.914935] ? __alloc_fd+0x95/0x150
[ 205.914938] do_sys_open+0x111/0x2a0
[ 205.914942] ? mem_cgroup_oom_synchronize+0x350/0x360

Revision history for this message
Colin Ian King (colin-king) wrote :

And got OOM'd at this point:

/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/target4:1:0/power/control
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/target4:1:0/power/runtime_enabled
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/supported_mode
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/fw_queue_depth
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/board_assembly
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/version_nvdata_persistent
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/version_nvdata_default
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host4/scsi_host/host4/host_trace_buffer_size
Connection to 10.246.72.34 closed by remote host.

Revision history for this message
Colin Ian King (colin-king) wrote :

Ah, OOM'ing was because the stressor was skipping some sysfs file entries and not free'ing them, I've pushed a fix to resolve that issue too. My bad.

Fix: http://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=da16b8c7220fbfbb4d5a2896ae428584b0838cae

Can you re-test and see if the two stress-ng fixes address the issue?

Changed in stress-ng (Ubuntu):
status: New → In Progress
status: In Progress → Fix Committed
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hi Colin,
I've tested this on the node pepe, but it's still spamming the message:
[ 904.500539] mpt2sas_cm0: _ctl_host_trace_buffer_size_show: host_trace_buffer is not registered

Revision history for this message
Colin Ian King (colin-king) wrote :

The spamming should in theory be reduced and the test should get through to completion. The stress-ng change just throttles back the exercising of the /sysfs file that causes execessive spamming. However, there are quite a few of these files that emit messages, so it's impossible to reduce these to zero.

Revision history for this message
Colin Ian King (colin-king) wrote :

I've added a fix to backoff when the sysfs stressor detects heavier spamming of the kmsg log,

http://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=339c9f99030be84489b3c448750bec338030cf49

This has been run through the ADT testing manually and no longer blocks testing to completion. So I think this final fix nails the bug.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Marking this as fix-released.
Thanks Colin!

Changed in stress-ng (Ubuntu):
status: Fix Committed → Fix Released
Changed in ubuntu-kernel-tests:
status: New → Fix Released
no longer affects: linux-hwe (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.