stress-ng vm stressor failing in version 0.07.28 on some systems
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Stress-ng |
Fix Released
|
High
|
Colin Ian King |
Bug Description
The 0.07.28 version of stress-ng is producing what may be spurious failures on the vm stressor on two systems: hogplum (a Dell PowerEdge T610 with 64 GiB of RAM) and wildorange (an IBM x3650 M2 with 56 GiB of RAM). An example run, on wildorange, looks like this:
$ sudo stress-ng -k --aggressive --verify --timeout 860 --vm 0
stress-ng: info: [7078] dispatching hogs: 8 vm
stress-ng: fail: [7093] flip: detected 24 memory errors
stress-ng: fail: [7093] rand-set: detected 24 memory errors
stress-ng: fail: [7093] ror: detected 24 memory errors
stress-ng: fail: [7093] swap bytes: detected 192 memory errors
stress-ng: fail: [7085] flip: detected 24 memory errors
stress-ng: fail: [7094] flip: detected 24 memory errors
stress-ng: fail: [7085] rand-set: detected 24 memory errors
stress-ng: fail: [7085] ror: detected 24 memory errors
stress-ng: fail: [7094] rand-set: detected 24 memory errors
stress-ng: fail: [7085] swap bytes: detected 192 memory errors
stress-ng: fail: [7094] ror: detected 24 memory errors
stress-ng: fail: [7094] swap bytes: detected 192 memory errors
stress-ng: fail: [7089] flip: detected 24 memory errors
stress-ng: fail: [7091] flip: detected 24 memory errors
stress-ng: fail: [7089] rand-set: detected 24 memory errors
stress-ng: fail: [7083] flip: detected 24 memory errors
stress-ng: fail: [7091] rand-set: detected 24 memory errors
stress-ng: fail: [7089] ror: detected 24 memory errors
stress-ng: fail: [7091] ror: detected 16 memory errors
stress-ng: fail: [7089] swap bytes: detected 192 memory errors
stress-ng: fail: [7091] swap bytes: detected 192 memory errors
stress-ng: fail: [7083] rand-set: detected 24 memory errors
stress-ng: fail: [7083] ror: detected 24 memory errors
stress-ng: fail: [7087] flip: detected 24 memory errors
stress-ng: fail: [7083] swap bytes: detected 192 memory errors
stress-ng: fail: [7081] flip: detected 24 memory errors
stress-ng: fail: [7087] rand-set: detected 24 memory errors
stress-ng: fail: [7087] ror: detected 24 memory errors
stress-ng: fail: [7081] rand-set: detected 24 memory errors
stress-ng: fail: [7081] ror: detected 24 memory errors
stress-ng: fail: [7087] swap bytes: detected 192 memory errors
stress-ng: fail: [7081] swap bytes: detected 192 memory errors
stress-ng: fail: [7079] stress-ng-vm: detected 264 bit errors while stressing memory
stress-ng: error: [7078] process 7079 (stress-ng-vm) terminated with an error, exit status=1
stress-ng: fail: [7082] stress-ng-vm: detected 264 bit errors while stressing memory
stress-ng: fail: [7080] stress-ng-vm: detected 264 bit errors while stressing memory
stress-ng: error: [7078] process 7080 (stress-ng-vm) terminated with an error, exit status=1
stress-ng: error: [7078] process 7082 (stress-ng-vm) terminated with an error, exit status=1
stress-ng: fail: [7084] stress-ng-vm: detected 264 bit errors while stressing memory
stress-ng: error: [7078] process 7084 (stress-ng-vm) terminated with an error, exit status=1
stress-ng: fail: [7090] stress-ng-vm: detected 264 bit errors while stressing memory
stress-ng: fail: [7088] stress-ng-vm: detected 256 bit errors while stressing memory
stress-ng: fail: [7086] stress-ng-vm: detected 264 bit errors while stressing memory
stress-ng: error: [7078] process 7086 (stress-ng-vm) terminated with an error, exit status=1
stress-ng: error: [7078] process 7088 (stress-ng-vm) terminated with an error, exit status=1
stress-ng: error: [7078] process 7090 (stress-ng-vm) terminated with an error, exit status=1
stress-ng: fail: [7092] stress-ng-vm: detected 264 bit errors while stressing memory
stress-ng: error: [7078] process 7092 (stress-ng-vm) terminated with an error, exit status=1
stress-ng: info: [7078] unsuccessful run completed in 860.06s (14 mins, 20.06 secs)
The exact pattern of failures varies from one run to the next; some other examples are available at:
* https:/
* https:/
* https:/
* https:/
Hogplum tests out fine when using stress-ng 0.07.21 under either Ubuntu 16.04.2 or 17.04; and stress-ng 0.07.28 fails under either Ubuntu version. I've tested wildorange less extensively.
To date, I have NOT encountered this problem on other systems, but most of the other test systems have significantly less RAM -- usually 4-8 GiB. One notable exception is lalande, a Dell PowerEdge C6320p with 64 GiB that passes the vm stressor but fails the brk stressor. (I'm still investigating that failure and may file another bug report.)
I've run memtest86+ on hogplum over the weekend (about 70 hours). It's completed three passes and is most of the way through a fourth with no errors so far. Of course, it's possible that stress-ng's vm stressor is uncovering a legitimate problem that memtest86+ is missing; but the replication of the same failure on two systems and the failure of memtest86+ to uncover any problems makes this look like it may be a stress-ng bug.
tags: | added: hwcert-server |
Changed in stress-ng: | |
status: | New → Triaged |
status: | Triaged → Confirmed |
importance: | Undecided → High |
assignee: | nobody → Colin Ian King (colin-king) |
Changed in stress-ng: | |
status: | Fix Committed → Fix Released |
Tracked this down to an optimization regression in the fast pseudo-random number generator, it seems that 8 and 16 bit random values were being cached and not flushed on a re-seed.
Fix committed: http:// kernel. ubuntu. com/git/ cking/stress- ng.git/ commit/ ?id=79aa85597f2 f7aa944d4d1d1c5 9fed49955c2ad7