Crash during breezy->dapper upgrade

Bug #37430 reported by Stewart Smith on 2006-03-31
32
Affects Status Importance Assigned to Milestone
module-init-tools (Ubuntu)
High
Ben Collins
pcmcia-cs (Ubuntu)
Medium
Unassigned

Bug Description

Upgrading using aptitude, after having replaced occurences of breezy in the repository list with dapper.

modprobe had gotten stuck in an infinite loop during pcmcia-cs startup (process '/etc/init.d/pcmcia-cs start' was running with a child of modprobe that was using 100% cpu). I killed the 'pcmcia-cs start' process and the upgrade continued.

Until a hard crash when starting acpi.

Naturally, now a half upgraded system that won't boot.

At the time i had the vmware modules loaded, but no virtual machines running. this is the free vmware server recently released.

Luckily I'm an advanced enough user that I can get around this with boot cds, dpkg foo and other such things. So I am making progress in getting back to a system that even boots.

Alexandre Otto Strube (surak) wrote :

do you have any other information? it stucked in acpi. what kind of chipset/mainboard/processor is that? what kernel were you using?

Stewart Smith (stewart) wrote :
Download full text (22.3 KiB)

Using the latest breezy 686smp kernel. Is a 2.8Ghz P4 with HT enabled. dmesg from dapper is below:

[4294667.296000] Linux version 2.6.15-19-386 (buildd@rothera) (gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu3)) #1 PREEMPT Mon Mar 20 16:46:02 UTC 2006
[4294667.296000] BIOS-provided physical RAM map:
[4294667.296000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[4294667.296000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[4294667.296000] BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
[4294667.296000] BIOS-e820: 0000000000100000 - 000000007ff30000 (usable)
[4294667.296000] BIOS-e820: 000000007ff30000 - 000000007ff40000 (ACPI data)
[4294667.296000] BIOS-e820: 000000007ff40000 - 000000007fff0000 (ACPI NVS)
[4294667.296000] BIOS-e820: 000000007fff0000 - 0000000080000000 (reserved)
[4294667.296000] BIOS-e820: 00000000ffb80000 - 0000000100000000 (reserved)
[4294667.296000] 1151MB HIGHMEM available.
[4294667.296000] 896MB LOWMEM available.
[4294667.296000] found SMP MP-table at 000ff780
[4294667.296000] On node 0 totalpages: 524080
[4294667.296000] DMA zone: 4096 pages, LIFO batch:0
[4294667.296000] DMA32 zone: 0 pages, LIFO batch:0
[4294667.296000] Normal zone: 225280 pages, LIFO batch:31
[4294667.296000] HighMem zone: 294704 pages, LIFO batch:31
[4294667.296000] DMI 2.3 present.
[4294667.296000] ACPI: RSDP (v000 ACPIAM ) @ 0x000f9e60
[4294667.296000] ACPI: RSDT (v001 A M I OEMRSDT 0x08000320 MSFT 0x00000097) @ 0x7ff30000
[4294667.296000] ACPI: FADT (v002 A M I OEMFACP 0x08000320 MSFT 0x00000097) @ 0x7ff30200
[4294667.296000] ACPI: MADT (v001 A M I OEMAPIC 0x08000320 MSFT 0x00000097) @ 0x7ff30390
[4294667.296000] ACPI: OEMB (v001 A M I OEMBIOS 0x08000320 MSFT 0x00000097) @ 0x7ff40040
[4294667.296000] ACPI: DSDT (v001 P4P81 P4P81086 0x00000086 INTL 0x02002026) @ 0x00000000
[4294667.296000] ACPI: PM-Timer IO Port: 0x808
[4294667.296000] ACPI: Local APIC address 0xfee00000
[4294667.296000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[4294667.296000] Processor #0 15:2 APIC version 20
[4294667.296000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
[4294667.296000] Processor #1 15:2 APIC version 20
[4294667.296000] WARNING: NR_CPUS limit of 1 reached. Processor ignored.
[4294667.296000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[4294667.296000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
[4294667.296000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[4294667.296000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[4294667.296000] ACPI: IRQ0 used by override.
[4294667.296000] ACPI: IRQ2 used by override.
[4294667.296000] ACPI: IRQ9 used by override.
[4294667.296000] Enabling APIC mode: Flat. Using 1 I/O APICs
[4294667.296000] Using ACPI (MADT) for SMP configuration information
[4294667.296000] Allocating PCI resources starting at 88000000 (gap: 80000000:7fb80000)
[4294667.296000] Built 1 zonelists
[4294667.296000] Kernel command line: root=/dev/hde3 ro quiet splash
[4294667.296000] mapped APIC to ffffd000 (fee00000)
[4294667.296000] mapped IOAPIC to ffffc00...

Stewart Smith (stewart) wrote :

the motherboard is an ASUS P4P 800 Deluxe

Michael Still (mikal) wrote :

I had the same freeze during the "Starting PCMCIA" step. I rebooted into rescue mode and ran dpkg from there to get it all working...

Dave Gilbert (ubuntu-treblig) wrote :

I can confirm this bug. I had this on my Tyan S2460 board.
Original was an up to date breezy, upgraded to dapper via an apt-get dist-upgrade today (starting about 4pm BST 9th April).
The hang was purely of keyboard access for me, the modprobe was of i82365
At the time there was nothing incriminating in dmesg last message was

Intel ISA PCIC probe: not found

I killed the parent process and then mouse clicks stopped after the next stage (which I think was bluetooth).

Matt Zimmerman (mdz) wrote :

There could be more than one bug here, or the eventual hang could be a result of a problem with loading the PCMCIA module.

For the hang: Please try to get a backtrace of the crash by following https://wiki.ubuntu.com/DebuggingSystemCrash

For the pcmcia infinite loop issue: Please get dmesg output at the point where it is looping, and try stracing the modprobe process to see what it is doing.

Passing this bug to linux-source-2.6.15 since the most serious issue is the hang; if we discover a secondary issue with pcmcia-cs, that will be filed separately

Mike Patterson (mpatters) wrote :

I can confirm this as well - I've got several machines at my disposal here and it only crashes on one of them. An Asus P4P800-VM with IDE disk works fine; VMWare Workstation 5.5 works fine; an Asus P5LD2-VM with SATA crashes. Booting off CD and chrooting then doing dpkg --configure fixed it up.

I'll try in the next couple of days to get backtraces like Matt Zimmerman asks.

Mike Patterson (mpatters) wrote :

Worked just ducky for me on the same hardware that was crashing last week (albeit with the -386 kernel on a fresh install), so maybe this issue has been fixed (at least for my hardware). I'll continue experimenting with different kernel versions.

Mike Patterson (mpatters) wrote :

It appears to be confined to when I'm using the 686-smp kernel - it locked up again on the P5LD2 motherboard.

You can see details here:
 https://www.cs.uwaterloo.ca/twiki/view/CF/BreezyToDapper

although that twiki isn't editable by people outside of our department. There's some very brief notes there (intended to be supplemented with our non-public RT), along with kern.log files from the machine in question and also some crappy pictures I took with our digital camera - but they don't show much, nothing came up on screen when I did the sysrq except "SysRq: Show State".

Alexandre Otto Strube (surak) wrote :

Mike, Stewart, any news on it? Thanks a lot!

Joe Kilner (joekilner) wrote :

I have had two similar occurences.

First was on a Dell desktop machine (can get the exact spec, but it was a dual Xeon job). I was installing using the gksudo "update-manager -d" command as mentioned in the ubuntu upgrade notice (after doing a full upgrade in aptitude to make sure everything was up to date). The machine (as well as the second one I will get to in a minute) was a Kubuntu CD install on to which I then installed the full Ubuntu package.

All was going well untill I hit the modprobe hang that the other user has mentioned above. I also killed the pcmcia-cs process and the installation proceeded fine. There was another hang later on another configure step, but I Ctrl-C 'd that and things carried on smoothley. An initial boot tried to use the old 686 SMP kernel on the machine and seg-faulted everywhere (something I assume will go away when there is an up to date 686 SMP kernel), so I rebooted in to the standard 686 kernel. This booted fine (and very quickly :) ) but I could then not run adept (package library was locked). I fell back to aptitude to see if that would work and it told me to runk 'dpkg --configure -a' which I did. This unlocked the system but ended up with some failed packages caused by kcontrol not configuring properly. Eventually (and for no apparent reason) after a few attempts to re-install kcontrol it suddenly configured itself properly and everything now works fine!

Buoyed by this success I decided to try the same process at home (Dual Athlon MP 1600's with 1Gb Ram on an Asus A7M266-D). The installation fell over at the PCMCIA stage as it had on my work PC, but unlike on the Dell I could not start a terminal or anything to kill the runaway process. As a result I had to shut down the PC mid-upgrade and now it hangs every time it tries to boot with the "Starting PCMCIA message" (this is the second PCMCIA message - there is already a "Start PCMCIA [failed]" entry in the boot text). Anyway, the result is currently a dead system (unless my search for booting with PCMCIA dissabled proves fruitful).

So that's my experience, hope there is a clue to what's going on in there somewhere (I know how hard debugging this kind of thing can be). My guess whould be some concurrency issue in the modprobe and the rest of these issues are just resulting symptoms, but I know how annoying having "customers" trying to guess the causes of their bugs can be, so I'll keep quiet ;)

Stewart Smith (stewart) wrote :

when i get home (about a week away) i shoud be able to find some time to do a breezy install and then trial the upgrade again.

Joe Kilner (joekilner) wrote :

I finally managed to fix the installation on the dual Athlon machine by Ctrl-C ing during a single user mode boot to avoid the PCMCIA hang, and then Ctrl Cing when pcmcia-cs tried to run during "dpkg --configure -a". This had to be repeated a few times... All the problems disapeared once the new K7 kernel had been installed and I could boot using that.

Given my experience I strongly suspect that the issue here is the new PCMCIA code running against a breezy SMP kernel (as has already been noted in the thread). It could be that an easy fix would be to recommend that people boot using a non-SMP kernel before upgrading.

Mike Patterson (mpatters) wrote :

Alexandre, same issue again, I tried it with update-manager this time, same story except it wouldn't even let me use my local mirror. :)

A co-worker updated a home machine (non-SMP kernel) with no issues, so I really think that's the key.

Matt Zimmerman (mdz) on 2006-05-03
Changed in linux-source-2.6.15:
assignee: nobody → ben-collins
Ben Collins (ben-collins) wrote :

From what I understand in this bug report, the system hangs during the upgrade process, but works ok on a fresh dapper install, and otherwise is fine once dapper is running.

If this is the case, then the bug is against breezy for crashing during the installation, or against something like pcmcia for doing bad things during the upgrade process (unloading/loading modules in an incompatable way).

Please confirm that what I think is what is occuring.

Mike Patterson (mpatters) wrote :

Ben, that's what it would seem to me, although it's not the PCMCIA bit that's the worst, it's the ACPI bits - wouldn't surprise me to find that it was the same problem in both cases though.

Andrew Ash (ash211) wrote :

I too encountered this bug upgrading from kubuntu breezy to dapper this morning. Restarting PCMCIA during the dist-upgrade process froze the system, which stopped responding to mouse movements and wouldn't drop to a virtual console with Ctrl+Alt+F1.

After forcing the system off by holding the power button, kernel 2.6.12-10-686-smp (what I was using) wouldn't boot because of a string of SegFaults. Neither would the recovery mode of that kernel, but I was able to get to a root console using an old kernel 2.6.12-9-386 in recovery mode.

'Sudo apt-get dist-upgrade' led to me to try 'dpkg --configure -a', which worked as everyone has said. But starting X didn't get kdm started. It turns out that the kubuntu-desktop package had disappeared from my system as well. Sudo apt-get installing it then worked fine. It would seem that SMP is the common problem here, though that's just a personal observation.

Paul Tarjan (spam-paulisageek) wrote :

I have just ran into this now :(

Same dmesg, same PCMCIA hanging. My only difference was the whole computer locked up at some step after CTRL+C ing the configuring of pcmcia-cs package. After rebooting into rescue move (adding rw init=/bin/bash to my kernel line in grub), I tried to run dpkg --configure -a. It is still hanging on the pcmcia-cs package. :( I removed it but it broke other dependencies. I guess I'll be spending some more time on this :(.

Ben Collins (ben-collins) wrote :

My only issue is that this happens on upgrade. At which point, the 2.6.15 kernel is not booted, so this cannot be a bug in that package. It may be a bug in 2.6.12 kernel, but the fact that it only occurs on upgrade leads me to believe that pcmcia-cs or something is doing the wrong this on upgrade (peforming tasks that should only be done when 2.6.15 is finally booted).

I think you might be right. I removed pcmcia-cs (and all dependencies)
while running the .12 kernel, and I was able to finally boot my .15
kernel. From there, I install kubutnut-dekstop, which depended on
pcmcia-cs. I bit my tongue and hoped, but all was well during this
install of pcmcia-cs.

Michael R. Head (burner) wrote :

I just saw this while upgrading from the latest breezy updates to dapper using the "sudo update-manager -d" method. update-manager has stopped and running top on the machine shows modprobe using 100% CPU with load at ~3.0 (nothing no other processes are using CPU).

The machine in question is a AMD Athlon 64 X2 4400+ running the 32bit version of breezy with the 2.6.12-10-k7-smp kernel.

Michael R. Head (burner) wrote :

BTW: I believe I also had this problem during my upgrade of a HT-enabled P4 machine using the 686-smp kernel a few months back.

Michael R. Head (burner) wrote :

One more comment: The P4 completely hung during upgrade. The Athlon here is still alive. I've got a screenshot of the upgrade-manager. I can't close the tick for "Terminal" for some reason, but I can scroll around in that area.

Screenshot of the upgrade-manager during pcmcia-cs upgrade. Here is the top of 'top':
top - 04:59:06 up 1 day, 12:52, 4 users, load average: 3.00, 3.00, 2.78
Tasks: 62 total, 5 running, 57 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 50.0% sy, 0.0% ni, 50.0% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 2076168k total, 1989944k used, 86224k free, 143908k buffers
Swap: 1558232k total, 10996k used, 1547236k free, 1625788k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13666 root 18 0 1596 584 484 R 100 0.0 16:26.58 modprobe

Michael R. Head (burner) wrote :

One more spam to this bug (sorry for not combining all these comments into one post).

On the machine under upgrade, it is not possible to kill modprobe, so "sudo kill -9 13666" leaves the above process alive. After killing the update manager, I did a sudo dpkg --configure -a to complete the upgrade, and the machine has now hung. It doesn't even respond to pings anymore.

After attempting to boot with the 2.6.15-386 kernel, I get a kernel panic -- not syncing: VFS: Unable to mount root fs on unknown block(0,0). I'm guessing it's initramfs wasn't made properly due to the failed pcmcia upgrade.

When I boot with the old 2.6.12-10-k7-smp kernel, the boot process stops at the "Starting PCMCIA services..." script. Ctrl-alt-delete and Ctrl-C fail to do anything, but I can see my keystrokes echoed on the text terminal.

When I boot with the 2.6.12-10-386 kernel, I get a nice big kernel backtrace on the console during "starting pcmcia services...", but the boot continues. With the non-SMP kernel, I was able to run the sudo dpkg --configure -a and finish the upgrade. After competeing the upgrade, the 2.6.15-23-k7 kernel boots the machine fine.

So it seems that there's some serious problem with the dapper versions of modprobe or pcmcia when running on the 2.6.12 SMP kernels.

Here is the kernel backtrace (pulled from /var/log/dmesg) that occurred during the boot of the 2.6.12-386 kernel on my athlon64 X2 dual core box.

eze80 (ezequiel-pozzo) wrote :

Hello!

I got the bug also. I was upgrading my kubuntu breezy to dapper and the pcmcia hanged my pc after several tries to configure, then, while booting the pcmcia module hanged my pc again every time I tried. I' m kind of a noob yet so I couldn't recover, had to reinstall kubuntu (no data lost, I love Linux!).

Anyway, I tried again to update, this time making sure I didn't have the pcmcia packaged installed. The instalation went fine. (I had to reinstall Breezy again for a different bug I already reported, just my luck).

I was reading the above comments and I can tell I also have: 686smp kernel, and a P4 in a Asus P5LD2 mobo with SATA.

Any other information I can give? I'm afraid I can't report any debug information of the crash because I reinstaled my system. But maybe there're other things I can help with...

Thanks!

Michael R. Head (burner) wrote :

Would changing 2.6.* to 2.6.15.* in the section below from /etc/init.d/pcmcia be a reasonable fix? Could it break something for laptops during the upgrade?

case "$1" in
    start)
        if [ "x$PCIC" = x ] && expr $(uname -r) : "2.6.*" >/dev/null 2>&1 \
            && ! ls /sys/class/pcmcia_socket/* >/dev/null 2>&1; then
            log_success_msg "PCMCIA not present"
            exit 1
        fi

Mike Patterson (mpatters) wrote :

Well, it may well be, but that doesn't help the later problem with ACPI, which is where things *really* wedged for me.

Michael R. Head (burner) wrote :

Then perhaps this is actually a bug on modutils or module-init-tools. When I was hit by it, it was the _second_ time pcmcia-cs was configured (not when acpi-support was configured). Both of them run modprobe, so that's probably the link.
Shall I reassign?

Michael R. Head (burner) wrote :

This is a pretty major bug for SMP/dualcore users that are upgrading to dapper using a breezy SMP kernel...

Janik De Goÿ (janik-de-goy) wrote :

had the same problem. also upgrading while running kernel 2.6.12-686-smp on a p4 with HT. System completely stuck. Rebooted into kernel 2.6.12-686. kde started and everything seemed ok. Opened a terminal, sudo dpkg --configure -a, upgrade continued and finished.

Now everything is fine. Thanks to your dpkg --configure -a command (I wouldn't have found this myself cause I am quite noob to).

It should be in the upgrade how to not to do it with kernel smp.

Changed in module-init-tools:
status: Needs Info → Fix Released
Changed in pcmcia-cs:
status: Unconfirmed → Fix Released
Mike Patterson (mpatters) wrote :

I don't know if this affects the "fix released", or if I was doing something wrong or what, but using archive.ubuntu.com and doing a dist-upgrade still crashes on me with smp kernel going from 5.10 -> 6.06, even with latest updates applied to 5.10 first.

jcg (pamjuan) wrote :

My experience is about the same; I installed with "update-manager -d" it hungs at "Starting PCMCIA".
finish the installed with 'dpkg --configure -a'
reboot with 6.06
reboot Ok but still
 the Starting PCMCIA fail
Driver 'sd" needs updating. please use bus-type method
starting X get " Your session only lasted less than 10 seconds"
error is " Registering your session with wtmp and utmp"
and suggestion is to start in failsafe mode.
Here I am in this mode it seems things are well.
any advice to fix things?

I * think* I managed to get round this problem, although I'm not entirely sure how. This is roughly what happened:

The upgrade process was started via update-manager. It hung on starting pcmcia as described by everyone else, and at the same time the keyboard stopped responding (maybe because I have a USB keyboard?). Mouse input still worked but all keystrokes failed to produce any output. Luckily I was still able to log in via ssh from another computer. I was unable to kill modprobe, which was using 100% CPU according to top, but I managed to kill the pcmcia rc script. The installation then appeared to continue, but got stuck on the install of python-2.4 instead.

At this point I had to force the computer to power off, and power back on. The system booted and seemed to be useable, there were quite a lot of things that weren't working properly according to the kernel boot messages, but luckily nothing critical. I tried 'dpkg --configure -a' as suggested here, but the install just hung at the same place.

Now comes the weird part: since I knew from this page that the problem was caused by the smp kernel, and I didn't have any other kernels installed, I thought I'd try and install another kernel. Initially I tried this via dpkg directly, but then decided to try synaptic. I selected the non-smp linux 686 package, and marked it for installation. When I clicked the 'Apply' button, synaptic appeared to go through the configuration of all the packages that had failed to set up correctly during the original upgrade. Note that I didn't press the "mark updates" button in synaptic. This time the packages seemed to configure OK, and the computer is working again. I haven't done any extensive testing, but both the old 2.6.12-smp kernel and the new 2.6.15-non-smp kernels I now have seem to boot OK. I haven't noticed anything untoward in the kernel boot messages (I have the 'quiet' kernel boot option turned off), and all the programs I've tried so far seem to be working as they're supposed to.

No idea if this will help anyone else but I thought I'd share it.

Hi, Pete ... can you run the following commands for me:

  dpkg-query -W pcmcia-cs

and

  apt-cache policy pcmcia-cs

and provide the output

(same goes for anyone else who, after a breezy->dapper upgrade using the update manager, has it hang on pcmcia)

$ dpkg-query -W pcmcia-cs
pcmcia-cs 3.2.8-5.2ubuntu6

$ apt-cache policy pcmcia-cs
pcmcia-cs:
  Installed: 3.2.8-5.2ubuntu6
  Candidate: 3.2.8-5.2ubuntu6
  Version table:
 *** 3.2.8-5.2ubuntu6 0
        500 http://gb.archive.ubuntu.com dapper-updates/main Packages
        100 /var/lib/dpkg/status
     3.2.8-5.2ubuntu5 0
        500 http://gb.archive.ubuntu.com dapper/main Packages

*ahem*, sorry about that

 pcmcia-cs (3.2.8-5.2ubuntu7) dapper-updates; urgency=low
 .
   * Correct the direction of the test for an upgrade from breezy so that
     pcmcia-cs is only restarted if we're _not_ upgrading that far.

I didn't entirely follow that last comment! Are you saying that the next version of pcmcia-cs will fix this bug?

Incidentally, why does kubuntu-desktop rely on so many non-essential things (including pcmcia-cs)?

It means that the version of pcmcia-cs now in the archive fixes the bug -- the bug only occurs on upgrade from breezy, now that you've upgraded you won't be bitten by it again.

(The bug is that pcmcia-cs loads a module that can crash the breezy kernel; but is perfectly safe to do on dapper -- we forgot that during upgrade, the dapper pcmcia-cs is still running on the breezy kernel)

pcmcia-cs is essential for some people -- maybe their root filesystem is on a pcmcia disk

Martijn Heemels (yggdrasil) wrote :

OK, can someone post some steps to take when you've already been bitten by this bug?

My kernel hung, so I had to do a reset and lots of things are failing now. Is there a general path to take to complete the upgrade? Might be helpful for people reading this thread.

To complete the upgrade:

$ sudo -s

If you're still running the breezy kernel (2.6.12), first do:

# wget http://archive.ubuntu.com/ubuntu/pool/main/p/pcmcia-cs/pcmcia-cs_3.2.5-7ubuntu7_i386.deb
# dpkg --unpack pcmcia-cs_3.2.5-7ubuntu7_i386.deb

For both breezy and dapper kernels, then do:

# dpkg --configure -a
# apt-get update
# apt-get install -f

then use whichever tool you used to perform the upgrade again, it will carry on from where it left off.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers