ramfs from ubuntu_stress_smoke_test failed on F-OEM-5.6

Bug #1879447 reported by Po-Hsu Lin
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Stress-ng
Fix Released
Critical
Colin Ian King
ubuntu-kernel-tests
Fix Released
Undecided
Unassigned
linux-signed-oem-5.6 (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Issue found on node "kili"
 ramfs STARTING
 ramfs RETURNED 2
 ramfs FAILED
 stress-ng: debug: [1471873] 56 processors online, 56 processors configured
 stress-ng: info: [1471873] dispatching hogs: 4 ramfs
 stress-ng: debug: [1471873] cache allocate: default cache size: 35840K
 stress-ng: debug: [1471873] starting stressors
 stress-ng: debug: [1471874] stress-ng-ramfs: started [1471874] (instance 0)
 stress-ng: debug: [1471875] stress-ng-ramfs: started [1471875] (instance 1)
 stress-ng: debug: [1471873] 4 stressors spawned
 stress-ng: debug: [1471876] stress-ng-ramfs: started [1471876] (instance 2)
 stress-ng: debug: [1471877] stress-ng-ramfs: started [1471877] (instance 3)
 stress-ng: debug: [1471874] stress-ng-ramfs: exited [1471874] (instance 0)
 stress-ng: debug: [1471875] stress-ng-ramfs: exited [1471875] (instance 1)
 stress-ng: debug: [1471876] stress-ng-ramfs: exited [1471876] (instance 2)
 stress-ng: debug: [1471873] process [1471874] terminated
 stress-ng: debug: [1471877] stress-ng-ramfs: exited [1471877] (instance 3)
 stress-ng: debug: [1471873] process [1471875] terminated
 stress-ng: debug: [1471873] process [1471876] terminated
 stress-ng: debug: [1471873] process [1471877] terminated
 stress-ng: info: [1471873] successful run completed in 6.00s
 stress-ng: fail: [1471873] ramfs instance 0 corrupted bogo-ops counter, 8689 vs 8688
 stress-ng: fail: [1471873] ramfs instance 0 hash error in bogo-ops counter and run flag, 2516733363 vs 3982935467
 stress-ng: fail: [1471873] ramfs instance 1 corrupted bogo-ops counter, 8718 vs 8717
 stress-ng: fail: [1471873] ramfs instance 1 hash error in bogo-ops counter and run flag, 2527989421 vs 544860162
 stress-ng: fail: [1471873] ramfs instance 2 corrupted bogo-ops counter, 8735 vs 8734
 info: 5 failures reached, aborting stress process
 stress-ng: fail: [1471873] ramfs instance 2 hash error in bogo-ops counter and run flag, 2400865158 vs 4125800824
 stress-ng: fail: [1471873] ramfs instance 3 corrupted bogo-ops counter, 8688 vs 8687
 stress-ng: fail: [1471873] ramfs instance 3 hash error in bogo-ops counter and run flag, 3982935467 vs 3855565640
 stress-ng: fail: [1471873] metrics check: stressor metrics corrupted, data is compromised

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.6.0-1010-oem 5.6.0-1010.10
ProcVersionSignature: User Name 5.6.0-1010.10-oem 5.6.8
Uname: Linux 5.6.0-1010-oem x86_64
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
CasperMD5CheckResult: skip
Date: Tue May 19 07:03:09 2020
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-signed-oem-5.6
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
tags: added: sru
tags: added: sru-20200427 ubuntu-stress-smoke-test
removed: sru
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

BTW this can only be reproduced on node kili (reproduce rate 3 out of 4 attempts), passed on node onibi

Revision history for this message
Colin Ian King (colin-king) wrote :

It was due to a racy termination where the child didn't terminate after the parent thought it had. I've fixed this:

https://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=97aaa339ed3cfa22e73f2dcbd0bac26b00f3a0e3

The bug was caught by some extra bogo op sanity checking I added yesterday.

Changed in stress-ng:
status: New → Fix Committed
importance: Undecided → Critical
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :

Please re-test, I believe the fix addresses this issue.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hello Colin,
I can still see rlimit failing on Focal, with:
Test suite HEAD SHA1: c6e6372

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-signed-oem-5.6 (Ubuntu):
status: New → Confirmed
Revision history for this message
Colin Ian King (colin-king) wrote :

I've pushed another fix for this. A process being OOM'd when stress-ng is being run with the --oomable flag is actually an expected termination point, so I'm making it return EXIT_SUCCESS with the fix:

https://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=b5e7448bd7a12426fb47df6809fa9e5b3bcbdfef

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Retesting all failed tests on different kernel now.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Test result looking good.

This test has passed on different nodes with different kernels, except one combination:
node kili + Trusty kernel. I will file a different bug for this.

Thanks

Changed in stress-ng:
status: Fix Committed → Fix Released
Changed in linux-signed-oem-5.6 (Ubuntu):
status: Confirmed → Invalid
Changed in ubuntu-kernel-tests:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.