reboot/poweroff stress test failed but the result in checkbox report is passed

Bug #1161718 reported by Jerry Kao
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox
Fix Released
High
Daniel Manrique

Bug Description

Test poweroff stress for 100 cycles on HP kirby and walle platform. System shut down and never boot up automatically. Need to boot up system manually. After bootup system, re-execute checkbox and re-run test case, checkbox retrive report directly and test result of poweroff is passed.
Execute poweroff test by pm_test manully, the test is fail also.

report uploaded on certificatin website
https://certification.canonical.com/hardware/201301-12624/submission/90153/

revision of checkbox
ii checkbox 0.15.3~ppa+bzr1930+201302151616~precise1 System testing application
ii checkbox-certification 0.15.3~ppa+bzr569+201302212159~precise1 Checkbox Certification Tests
ii checkbox-gtk 0.15.3~ppa+bzr1930+201302151616~precise1 GTK interface for checkbox
ii checkbox-oem 0.46-0-precise1~rev616 Checkbox OEM Tests
ii checkbox-oem-bug 1.0-2-precise1~rev4 Checkbox OEM bug
ii checkbox-oem-gtk 0.46-0-precise1~rev616 Checkbox OEM GTK Interface
ii checkbox-oem-stella 0.1-1precise1~rev20 stella System Testing
ii checkbox-oem-stella-gtk 0.1-1precise1~rev20 Checkbox stella GTK Interface
ii checkbox-oem-stella-urwid 0.1-1precise1~rev20 Checkbox stella Urwid Interface
ii checkbox-oem-urwid 0.46-0-precise1~rev616 Checkbox OEM Urwid CLI
ii checkbox-qt 0.13.9 QT4 interface for checkbox
ii checkbox-urwid 0.15.3~ppa+bzr1930+201302151616~precise1 Urwid interface for checkbox

Related branches

Revision history for this message
Jerry Kao (jerry.kao) wrote :

checkbox log

Revision history for this message
Jerry Kao (jerry.kao) wrote :

pm_test poweroff log

description: updated
Revision history for this message
Jerry Kao (jerry.kao) wrote :

whitelist

description: updated
Revision history for this message
Jeff Lane  (bladernr) wrote :

The script uses the system wakealarm IRQ as a convenience to trigger the power-on after X seconds. Not every system honors the WAKEALARM IRQ however.

If the system can successfully shut down and be powered back on, by any means, this is a successful test. The use of the wakealarm is simply to help keep the tester from getting a sore finger from having to push the power button 100 or 250 times.

It IS possible you could get this resolved with a BIOS fix, perhaps?

As a final test, does the multi-cycle hibernate/resume test work?

I would imagine (you'd need to confirm this with someone more knowledgable though) that it would be possible that the wakealarm would work in a suspend/resume situation because the system remains at least partially powered on, where in a power-off or hibernate/resume situation, the system is in a completely shutdown power state and may NOT change states via wakealarm.

Changed in checkbox:
status: New → Incomplete
importance: Undecided → Low
Revision history for this message
Jeffrey Chang (modern911) wrote :

I think the problem to us is even we could manually press power button to wake the system,
the test will be "pass" after on poweroff cycle.
And it didn't tell the truth that test is not complete in any test log, and report a wrong result.

I believe there's still some problem in the script.
At least, we need to know there's a problem from the test result.

Jerry Kao (jerry.kao)
summary: - Poweroff stress test failed but the result in checkbox report is passed
+ reboot/oweroff stress test failed but the result in checkbox report is
+ passed
Revision history for this message
Jerry Kao (jerry.kao) wrote : Re: reboot/oweroff stress test failed but the result in checkbox report is passed

The issue also happen with reboot stress test.

In this case, reboot test did not finish but the result is passed.
Please check the test result, checkbox log and reboot stress log.

Revision history for this message
Jerry Kao (jerry.kao) wrote :
summary: - reboot/oweroff stress test failed but the result in checkbox report is
+ reboot/poweroff stress test failed but the result in checkbox report is
passed
Revision history for this message
Jerry Kao (jerry.kao) wrote :
Revision history for this message
Jerry Kao (jerry.kao) wrote :
Changed in checkbox:
status: Incomplete → New
Revision history for this message
Jerry Kao (jerry.kao) wrote :

Test on Bambi 1 (HP Pavilion 14 PC) with Checkbox 0.16.1~ppa+bzr1968+201405071900 for reboot stress test.
The pm_test reboot test is fail. System can not boot up itself. But in checkbox report it's passed.
https://certification.canonical.com/hardware/201301-12661/submission/91569/test-results/pass/

tags: added: ce-qa-concern
Changed in checkbox:
status: New → Incomplete
Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

It is *not* a criteria of the test that the systems boots itself up automatically, just that it both shuts down cleanly and powers on again cleanly. Then, following the wording of the test, you should run checkbox and select 'Re-run' when prompted, then select either 'Yes' or 'No' depending on what the tester observed. An example of a failure would be as I've seen on my own system in the past, where selecting Shutdown does not power the system all the way down, and the power has to be removed from the hardware to let it do that - in that case the test would fail.

Revision history for this message
Jerry Kao (jerry.kao) wrote :

@Brendan
Please check the attached log which is tested this week with Bambi1. The reboot test was not finished and system restart as reboot test was completed. System stay at desktop as normal. When re-launch checkbox and select "rerun", the test case was poweroff which is right after reboot.
Tester can not aware any odd and keep running the test. The result is also passed of reboot. But when check the log of reboot test, found the reboot was just executed once.
So, my point is, the test is just executed once or the log is even fail, but the result is passed in checkbox report. Normally, tester can not aware the abnormal and just go ahead. The issue/bug probably will not be discovery.

Changed in checkbox:
status: Incomplete → New
Revision history for this message
Jerry Kao (jerry.kao) wrote :

The issue happen on vovic 1 today. The reboot test was run just once and result was passed.

Changed in checkbox:
status: New → Confirmed
Changed in checkbox:
importance: Low → High
Revision history for this message
Ara Pulido (ara) wrote :

Daniel, can you have a look to this bug, please?

Changed in checkbox:
assignee: nobody → Daniel Manrique (roadmr)
status: Confirmed → Triaged
Revision history for this message
Daniel Manrique (roadmr) wrote :
Download full text (4.4 KiB)

OK, ignoring several thousand lines of log file, this is the first run of stress/poweroff, and subsequent start of checkbox after it's run. Notice the jump in timestamps, I assume it took from 10:34 to 12:01 to run the 100 poweroff iterations.

Things to notice:
- Checkbox tries to execute stress/poweroff, then sends a few messages to the backend (the one that actually executes the commands, the first message is a confirmation ping, message 22 is the actual command). The system reboots after that, so checkbox just disappears.
- At 12:01 checkbox starts again and reads the configuration. See a bit further below for more logs...

2013-04-11 10:34:14,961 DEBUG Started firing message-exec.
2013-04-11 10:34:15,030 DEBUG Calling <string> MessageInfo.message_exec({'status': 'uninitiated', 'description': 'Stress poweroff s
ystem (100 cycles)', 'plugin': 'shell', 'requires': ["package.name == 'upstart'", "package.name == 'fwts'"], 'command': 'pm_test -r 100 --sile
nt --log-level=notset poweroff', 'user': 'root', 'suite': '__stress__', 'type': 'test', 'resources': [{'status': 'Installed', 'desired': 'Inst
all', 'version': '1.5-0ubuntu7.2', 'name': 'upstart', 'description': 'event-based init daemon'}, {'status': 'Installed', 'desired': 'Install',
 'version': '0.25.06precise1', 'name': 'fwts', 'description': 'FirmWare Test Suite'}], 'name': 'stress/poweroff'}) for message-exec with prior
ity 0.
2013-04-11 10:34:15,030 INFO Executing stress/poweroff
2013-04-11 10:34:15,031 DEBUG Calling <string> BackendInfo.message_exec({'status': 'uninitiated', 'description': 'Stress poweroff system (100 cycles)', 'plugin': 'shell', 'requires': ["package.name == 'upstart'", "package.name == 'fwts'"], 'command': 'pm_test -r 100 --silent --log-level=notset poweroff', 'user': 'root', 'suite': '__stress__', 'type': 'test', 'resources': [{'status': 'Installed', 'desired': 'Install', 'version': '1.5-0ubuntu7.2', 'name': 'upstart', 'description': 'event-based init daemon'}, {'status': 'Installed', 'desired': 'Install', 'version': '0.25.06precise1', 'name': 'fwts', 'description': 'FirmWare Test Suite'}], 'name': 'stress/poweroff'}) for message-exec with priority 0.
2013-04-11 10:34:15,031 DEBUG Sending message with sequence number 21 to backend
2013-04-11 10:34:15,031 DEBUG Expecting sequence number 21 from backend, got sequence number 21
2013-04-11 10:34:15,031 DEBUG Sending message with sequence number 22 to backend
2013-04-11 12:01:45,952 INFO Reading configurations from: /usr/share/checkbox-oem-stella/configs/checkbox-oem-stella-gtk.ini

OK, so here Checkbox finished processing everything and is now rerunning stress/poweroff. As reported, notice that the test is reported as passed, but it took 12.33 seconds to run, which is strange. So this validates the notion that the job itself is marked as passed if it's run again by checkbox.

2013-04-11 12:02:02,807 INFO Executing stress/poweroff
2013-04-11 12:02:02,807 DEBUG Calling <string> BackendInfo.message_exec({'status': 'uninitiated', 'description': 'Stress poweroff system (100 cycles)', 'plugin': 'shell', 'requires': ["package.name == 'upstart'", "package.name...

Read more...

Revision history for this message
Daniel Manrique (roadmr) wrote :

A possible solution for this problem is below, but first an analysis and explanation:

This is the stella stress whitelist. I'm removing the resource and local jobs, the important ones are:

power-management/rtc
power-management/hibernate_advanced
power-management/hibernate-single-log-check
power-management/hibernate-single-log-attach

Notice all these tests are in sets of three: first, the actual functional test, followed by a "check" test that parses the log and produces a pass/fail with some diagnostics, and finally a job that attaches the complete logs for later review.

suspend/suspend_advanced
suspend/suspend-single-log-check
suspend/suspend-single-log-attach
power-management/suspend_30_cycles
power-management/suspend-30-cycles-log-check
power-management/suspend-30-cycle-log-attach
power-management/suspend-30-cycles-time-check
power-management/hibernate_30_cycles
power-management/hibernate-30-cycles-log-check
power-management/hibernate-30-cycle-log-attach
__stress__
stress/cpu_stress_test

Up to this point notice how all the jobs (except cpu_stress_test which is different) follow the same logic, even the 30-cycle jobs do this.

For the reboot and poweroff stress tests (100 automated iterations each), notice we only have two jobs. The functional job (reboot or poweroff) will always pass. This is by design, because the command's return code only reflects whether the command launched correctly, and without parsing the log file, checkbox can't determine whether the test actually failed.

Then the _log test is the one that compresses and attaches the logs for the submission.

stress/reboot
stress/reboot_log
stress/poweroff
stress/poweroff_log

Notice that the check jobs are missing. The check jobs are the ones that will actually produce a pass/fail evaluation, based on results from the log file.

The way to fix this problem, then, is to add the following jobs to the stress whitelist (or any whitelist that uses stress/reboot):

stress/reboot_check
stress/poweroff_check

So:
1- Testers should ignore stress/reboot and stress/poweroff result. They will always indicate "pass", unless the pm_test command fails in the first invocation (should never happen).
2- Add the _check jobs to your whitelists. This has to be done in checkbox-oem-stella or any package that contains whitelists with these jobs.
3- Testers can look at results of stress/reboot_check and stress/poweroff_check to determine if the functionality is working fine.

I'll set this bug to incomplete awaiting feedback. Please let me know if this solution is acceptable. If not, we can try to implement something else. But because of how checkbox and the pm_test script are designed, this is possibly the easiest/best solution.

Changed in checkbox:
status: Triaged → Incomplete
Revision history for this message
Jerry Kao (jerry.kao) wrote :

@Daniel

Thanks for your suggestion.
It does work to discover the abnormal of reboot and poweroff stree test.

I had a test with Stella vDT Greenwood for trial. The poweroff stress test is passed but the log message is as follow
2013-09-12 09:29:43,390 INFO poweroff operations remaining: 97
2013-09-12 09:29:43,391 DEBUG Removing desktop file ('/home/u/.config/autostart/pm_test.desktop')...
2013-09-12 09:29:43,391 DEBUG Restoring sudoers configuration...
2013-09-12 09:29:43,392 DEBUG Executing: "sed -i -e '/# Automatically added by pm.py/,+1d' /etc/sudoers"...
2013-09-12 09:29:43,460 DEBUG Restoring autologin configuration...
2013-09-12 09:29:43,481 ERROR Poweroff time greater than expected: 14:43:17.391186 > 0:05:00
2013-09-12 09:29:43,481 INFO Poweroff test failed

We can have a fail result in test case poweroff_check. It helps tester to notice the fail.

Revision history for this message
Jerry Kao (jerry.kao) wrote :

@Daniel

I met another situation. Reboot test complete but the test case (reboot_check) is fail. It is also confusing.

Daniel Manrique (roadmr)
tags: added: job
Ara Pulido (ara)
Changed in checkbox:
status: Incomplete → Triaged
Revision history for this message
Jerry Kao (jerry.kao) wrote :

stress/reboot and stress/poweroff are completed but stress/reboot_check and stress/poweroff_cehck are failed.

https://certification.canonical.com/hardware/201310-14256/submission/95244/

Revision history for this message
Daniel Manrique (roadmr) wrote :

I think the commands are wrong. On the directory you sent, there are two log files:

$ ls pm_test.*.100.log
pm_test.poweroff.100.log pm_test.reboot.100.log

The pm_log_check script takes two arguments, the file to verify and the output log, but it's being called with *three* arguments, which I think is causing the failure. The command is this:

pm_log_check --log-level=notset $CHECKBOX_DATA/*poweroff.100.log $CHECKBOX_DATA/pm_log_check_poweroff.100.log

Since the two files mentioned above match the *poweroff.100.log, there's some bad behavior happening here.

I'm sending a merge request with a fix proposal. You can test if it works by editing /usr/share/checkbox/jobs/stress.txt and editing the pm_log_check commands, replacing the * with pm_test, so they look like this (example for reboot test, should be all in one line):

command: pm_log_check --log-level=notset $CHECKBOX_DATA/pm_test.reboot.100.log $CHECKBOX_DATA/pm_log_check_reboot.100.log

Revision history for this message
Daniel Manrique (roadmr) wrote :

Hi Jerry,

I checked and a fix for this bug should have landed a few weeks ago. Could you please confirm things work as expected now? I set the bug to Fix Committed but will await for your confirmation before moving to Fix Released.

Changed in checkbox:
milestone: none → 2014-01-17
status: Triaged → Fix Committed
Revision history for this message
Jerry Kao (jerry.kao) wrote :

Hi Daniel,

I just finished a test with "checkbox 0.17.2+bzr1999+201312160133~ubuntu12.04.1" on Stella-Amber1, the issue still can be seen. Stress/reboot and stress/poweroff are completed but stress/reboot_check and stress/poweroff_cehck are failed.
test submission stress/reboot and stress/poweroff are completed but stress/reboot_check and stress/poweroff_cehck are failed.

Revision history for this message
Jerry Kao (jerry.kao) wrote :

Hi Daniel,
sorry for worng link
test submission https://certification.canonical.com/submissions/status/6966

Jerry Kao (jerry.kao)
Changed in checkbox:
status: Fix Committed → Confirmed
Changed in checkbox:
milestone: 2014-01-17 → 2014-jan-31
Changed in checkbox:
milestone: 2014-jan-31 → 2014-feb-14
Zygmunt Krynicki (zyga)
Changed in checkbox:
milestone: 2014-feb-14 → 2013-feb-28
Changed in checkbox:
milestone: 2013-feb-28 → 2014-mar-14
Daniel Manrique (roadmr)
Changed in checkbox:
milestone: 2014-mar-14 → 2014-mar-28
Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

Hi Jerry. I just noticed that in your last comment from January you mentioned you used: 0.17.2+bzr1999+201312160133~ubuntu12.04.1. This version is from last year, so can you please test again with the latest version of checkbox?

Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

Jerry, actually I apologies, ignore that - the revision is of the release branch. It should be the right one.

Zygmunt Krynicki (zyga)
Changed in checkbox:
status: Confirmed → Fix Released
Changed in checkbox:
milestone: 2014-mar-28 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.