Cobbler install of recent raring-desktop images failing

Bug #1092924 reported by Max Brustkern
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
UTAH
Triaged
High
Andy Doan
linux (Ubuntu)
Confirmed
High
Brad Figg
Raring
Confirmed
High
Brad Figg

Bug Description

Automatic installation via CobblerMachine of raring desktop amd64 images has been failing since build number 20121220. This is with the bootspeed preseed or the default preseed. i386 images still seem to work as expected.

Changed in utah:
status: New → Incomplete
importance: Undecided → Medium
status: Incomplete → Triaged
Revision history for this message
Max Brustkern (nuclearbob) wrote :

Even when using an empty preseed, the installation still seems to stall. In many cases, the installation stalls during the "copying files" step at 27 of 69 files.

Some other times, the machine becomes unresponsive, showing only a black screen. Over the KVM it is difficult to be certain, but it appears that the machine does send a video signal, it's just a full black screen that's getting sent.

I'd like to see if there is another hardware platform on which we can try to recreate this.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Also, the affected machines appear to show a console behind ubiquity instead of a desktop. The top bar with indicators is still present, however, so it may be a framebuffer issue.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

This may be happening on i386 images as well, but I have had more recent successes with those.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

It looks like lp1080701 may be the root cause of this. I'm trying an install after deleting the existing partitions. If that is the case, we can erase the partition table on a machine when we're done with the tests.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Removing the partitions doesn't seem to have helped, so there may be other issues besides this bug. I'm trying to manually install via cobbler to see what step fails.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Current server ISOs work. Quantal destkop ISOs work. The daily amd64 ISO from 2012-12-19 works, but I think some earlier raring ISOs don't. I'm going to continue trying some other options to narrow things down.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

The 2012 12 21 amd64 desktop ISO works, but I haven't been able to get 2012 12 22 to work. I'll try i386.

summary: - Cobbler install of recent raring-desktop-amd64 images failing
+ Cobbler install of recent raring-desktop images failing
Revision history for this message
Max Brustkern (nuclearbob) wrote :

My previous assumptions about the date range for images with this bug may be incorrect. I'm seeing a lot of issues when trying to re-run older raring ISOs, and we're seeing more success now on newer images. Which runs are this and which are lp:1099995 is becoming more nebulous without individual examinations, which is impossible to do retroactively.

Changed in utah:
importance: Medium → High
Gema Gomez (gema)
Changed in utah:
assignee: nobody → Max Brustkern (nuclearbob)
Andy Doan (doanac)
Changed in utah:
assignee: Max Brustkern (nuclearbob) → Andy Doan (doanac)
Revision history for this message
Andy Doan (doanac) wrote :
Revision history for this message
Andy Doan (doanac) wrote :

I spent some time reproducing this today. I'm using a local UTAH branch and the 20130116 build (it was the most recent that was tagged with this bug). Of the course of the day I've run the following command 14 times:

 sudo -u jenkins -i PYTHONPATH=/home/doanac/utah run_bootspeed_job.py -d -i /data/iso/ubuntu/daily-live/20130116/raring-desktop-amd64.iso acer-veriton-04

Of those 14 runs, I saw 6 fails. 1 of the failures appeared to be bug:

 https://bugs.launchpad.net/utah/+bug/1099995

The other 5 failed the same way each time. The install froze where the status showed:

 retrieving file 12 of 69

The message showing up just before that (in the 2 cases I happened to be watching) was "scanning the mirror".

At this point the system was totally locked up. I tried to send "ctrl-alt-f2" to get a CLI and poke around, but it was totally frozen.

Not sure how helpful this is yet - I guess we need to understand what's happening while retrieving this mysterious 12th file.

Revision history for this message
Andy Doan (doanac) wrote :
Download full text (11.1 KiB)

With Max's new rsyslog functionality, I've gotten a stack trace out of the 2013-02-04 build.

2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324073] INFO: task Xorg:5083 blocked for more than 120 seconds.
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324086] Xorg D ffff88007e293f40 0 5083 3871 0x00400000
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324094] ffff8800780e7c58 0000000000000046 ffff880075661740 ffff8800780e7fd8
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324103] ffff8800780e7fd8 ffff8800780e7fd8 ffff88007ac21740 ffff880075661740
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324110] ffff8800725e3800 ffff8800770ae268 ffff8800725e0000 0000000000000000
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324118] Call Trace:
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324133] [<ffffffff816c6e79>] schedule+0x29/0x70
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324190] [<ffffffffa00a02e5>] intel_crtc_wait_for_pending_flips+0x75/0xd0 [i915]
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324199] [<ffffffff8107dc00>] ? finish_wait+0x80/0x80
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324232] [<ffffffffa00a2d67>] i9xx_crtc_disable+0x87/0x180 [i915]
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324266] [<ffffffffa00a7baf>] intel_crtc_update_dpms+0x6f/0xa0 [i915]
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324299] [<ffffffffa00adc47>] intel_crt_dpms+0x77/0xc0 [i915]
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324331] [<ffffffffa00159b0>] drm_mode_obj_set_property_ioctl+0x330/0x340 [drm]
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324358] [<ffffffffa00159f0>] drm_mode_connector_property_set_ioctl+0x30/0x40 [drm]
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324382] [<ffffffffa0004559>] drm_ioctl+0x4e9/0x5b0 [drm]
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324411] [<ffffffffa00159c0>] ? drm_mode_obj_set_property_ioctl+0x340/0x340 [drm]
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324419] [<ffffffff81085bf8>] ? lg_global_unlock+0x48/0x60
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324427] [<ffffffff811a52a9>] do_vfs_ioctl+0x99/0x570
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324435] [<ffffffff8106be9b>] ? recalc_sigpending+0x1b/0x60
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324442] [<ffffffff8106c7b7>] ? __set_task_blocked+0x37/0x80
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324449] [<ffffffff811a5811>] sys_ioctl+0x91/0xb0
2013-02-15T20:19:39+00:00 acer-veriton-04 kernel: [ 849.324458] [<ffffffff816d06dd>] system_call_fastpath+0x1a/0x1f
2013-02-15T20:21:39+00:00 acer-veriton-04 kernel: [ 969.324076] INFO: task Xorg:5083 blocked for more than 120 seconds.
2013-02-15T20:21:39+00:00 acer-veriton-04 kernel: [ 969.324084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2013-02-15T20:21:39+00:00 acer-ver...

tags: added: kernel-key
tags: added: raring
Changed in linux (Ubuntu):
status: New → Triaged
importance: Undecided → High
status: Triaged → New
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1092924

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Andy Doan (doanac) wrote : apport information

ApportVersion: 2.8-0ubuntu4
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: jenkins 1655 F.... pulseaudio
CurrentDmesg:

DistroRelease: Ubuntu 13.04
HibernationDevice: RESUME=UUID=e219028f-d0cf-4acc-a723-87059e3526eb
InstallationDate: Installed on 2013-02-19 (0 days ago)
InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Alpha amd64 (20130204)
MachineType: Acer Veriton N281G
MarkForUpload: True
Package: linux (not installed)
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-4-generic root=UUID=4d2cc5ec-fdd8-4133-846f-c7d56a83f062 ro initcall_debug quiet printk.time=1
ProcVersionSignature: Ubuntu 3.8.0-4.8-generic 3.8.0-rc6
RelatedPackageVersions:
 linux-restricted-modules-3.8.0-4-generic N/A
 linux-backports-modules-3.8.0-4-generic N/A
 linux-firmware 1.100
Tags: raring running-unity raring running-unity
Uname: Linux 3.8.0-4-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dialout lpadmin plugdev sambashare sudo utah
dmi.bios.date: 04/06/2011
dmi.bios.vendor: Acer
dmi.bios.version: P01-A3L
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: Veriton N281G
dmi.board.vendor: Acer
dmi.board.version: To be filled by O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: Acer
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAcer:bvrP01-A3L:bd04/06/2011:svnAcer:pnVeritonN281G:pvrToBeFilledByO.E.M.:rvnAcer:rnVeritonN281G:rvrTobefilledbyO.E.M.:cvnAcer:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: Veriton N281G
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: Acer

tags: added: apport-collected running-unity
Revision history for this message
Andy Doan (doanac) wrote : AlsaInfo.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : BootDmesg.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : CRDA.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : IwConfig.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : Lspci.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : Lsusb.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : ProcModules.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : PulseList.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : RfKill.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : UdevDb.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : UdevLog.txt

apport information

Revision history for this message
Andy Doan (doanac) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu Raring):
status: Incomplete → Confirmed
Revision history for this message
Andy Doan (doanac) wrote :

Things seem to fail a little less often today. However, with the 2013-02-19 64bit desktop build, I've got the following failure 2 out of 7 tries:

 http://paste.ubuntu.com/1684415/

The first 494 lines are mainly UTAH stuff. line 496 is where we start displaying rsyslog messages we are getting from the target. Line 2006 is where the crash seems to occur.

Revision history for this message
Andy Doan (doanac) wrote :

Tested the 2013-02-20 live desktop image today and hit the problem 2 out of 4 tries

Revision history for this message
Brad Figg (brad-figg) wrote :

I've had a quick chat with Andy and he is going to test a number of images to try to find when this first started happening. If he can discover that, it will get us a long way to finding where this issue was introduced.

Brad Figg (brad-figg)
Changed in linux (Ubuntu Raring):
assignee: nobody → Brad Figg (brad-figg)
Revision history for this message
Andy Doan (doanac) wrote :

the magners-orchestra server had images going back as far as 2012-12-03. I'm seeing this image fail with pretty much the same stack trace:

 http://paste.ubuntu.com/1692911/

Revision history for this message
Andy Doan (doanac) wrote :

I was supposed to try out different kernels on a fixed ISO today. Its possible I wasn't doing it correctly, but tell UTAH to just an alternative vmlinuz wasn't working because the system wouldn't boot.

Nonetheless, I did run my "recreate" script on 3 different systems in the lab with the different video cards (nvidia, radeon, and intel). These all worked without any problems. So the issue seems to be this specific card thats in acer-veriton-*.

Revision history for this message
Andy Doan (doanac) wrote :

2013-03-25 still failing. I'm not sure creating an ISO is that useful since this seems to be pretty specific to the acer-veriton's in our lab.

@brad - i have a simple script that can be run from our lab to re-create this bug. Maybe someone on your team could try this out and poke around?

Revision history for this message
Andy Doan (doanac) wrote :

I've done some bisection and am really stuck on getting much further. It appears it happened during the switch from 3.5 to 3.7 kernels. I see installs work reliably with 3.5.0-17 kernels, but they fail with 3.7.0-4.

Revision history for this message
Andy Doan (doanac) wrote :
tags: removed: kernel-key
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.