Feisty crash possibly related to mdadm/raid5

Bug #110304 reported by DesktopMan
8
Affects Status Importance Assigned to Milestone
linux-source-2.6.20 (Ubuntu)
Invalid
Low
Unassigned

Bug Description

Running Linux 2.6.20-15-server #2 SMP 32bit

First, I also had the "UUID of raid devices have changed after upgrade to Feisty", which I "fixed" by editing mdadm.conf with the new id. After that I started the array which rebuilt itself.

I thought everything was allright, until a hard crash. I rebooted, array was dirty AND degraded, and I had to manually re-add a drive. (mind you, the drive that faults is random, so there arn't any real bad drive here)
Even display image disappears when it crashes.

I don't really know where to look for information on the crash, or even what is causing it, all I know it was not happening in 6.10 and might be mdadm/raid5 related. I'm also using cryptsetup for 128 bit AES encryption on top of this raid5.

description: updated
Revision history for this message
Voltaire (jkrueger-muenster) wrote :

Could you please give some more information on the Crash? Did your computer while booting? Or was your system up and running? After fixing Bug 107080 on my System, it has run normally aside from bug 103603 which is not connected to my RAID problems.

Revision history for this message
DesktopMan (christian-auby) wrote :
Revision history for this message
DesktopMan (christian-auby) wrote :
Revision history for this message
Voltaire (jkrueger-muenster) wrote :

Sorry. This means: "Did your computer crash while booting?"

Revision history for this message
DesktopMan (christian-auby) wrote :

It crashed after having been up for atleast some hours. (crashed while I was sleeping) The raid array was mounted, but as I said I don't really know if it's related or not.

Revision history for this message
Voltaire (jkrueger-muenster) wrote :

The Output of mdadm --query --detail shows that your RAID is currently rebuilding (Rebuild Status : 4% complete). The UUID and the Status of the drives seems to be OK. It's hard to guess what the reason for the crash was.

Revision history for this message
DesktopMan (christian-auby) wrote :

Yes I know everything seems fine now, but at boot it refused to start, I had to manually re-add one of the drives and mdadm --run it.

The crash has happened twice. I'm currently checking if it also occurs with the old kernel (2.6.17-11)

The reason I upgraded in the first place was to prevent data corruption on raid5 with cryptsetup, as reported for kernels < 2.6.19.

Revision history for this message
G.Koehler (support-softmill) wrote :

Same problem here. RAID-5 array made from 3 partitions (not whole disks) sda3 + sdb2 + sdc1 randomly sets one of the devices "removed". After rebuild everything's fine. Takes about one day to strike again. Not figured out yet, if it corelates with disk stress.

Revision history for this message
DesktopMan (christian-auby) wrote :

I can confirm that everything works as it should on 2.6.17-11

I haven't come any closer to figuring out what happens on 2.6.20-15 and 2.6.20-16. Sometimes it crashes on mount, sometimes it can work for hours.

Revision history for this message
Kees Cook (kees) wrote :

Is this a kernel crash? (i.e. the system is hung?) If so, can you capture the crash output and attach it to this bug? Without more information, it's not clear how to proceed with reproducing or diagnosing this bug. Thanks!

Revision history for this message
DesktopMan (christian-auby) wrote :

Beeing somewhat new to Linux I'm not sure what constitutes a kernel crash. Everything locks up completely, sysrq does not work. No output to the monitor.

Revision history for this message
Kees Cook (kees) wrote :

G.Koehler. are you able to catch any portion of the kernel crash? It will be hard to move forward with this bug without some way to reproduce it or see specifically what areas are failing. Thanks!

Revision history for this message
G.Koehler (support-softmill) wrote : AW: [Spam Gefunden!] [Bug 110304] Re: Feisty crash possibly related to mdadm/raid5

Well, the crazy thing is, that it did not happen again. Not even under heavy stress, which led to crashes before.
The only explanation I can find is that maybe it's gone since I downloaded and installed all available updates, as ubuntu suggested.
I had to take this chance to get going with a project and so from that perspective I am glad it worked.
On the other hand this is not very usefull for tracing back a problem - sorry.
Would be interesting if I am the only one who "solved" the problem that way...
So in this situation I think we should leave it as it is, should'nt we ?
Being in the troubleshooting business for quite a while, this hurts me a little, as it is NOT the professional way to solve such problems.
If I can find ANY useful hint, I'll let you know for sure.

Thanks and best regards from Austria.
Gerald

-----Ursprüngliche Nachricht-----
Von: <email address hidden> [mailto:<email address hidden>] Im Auftrag von Kees Cook
Gesendet: Montag, 18. Juni 2007 18:15
An: SOFTMILL - Ing. Gerald Köhler
Betreff: [Spam Gefunden!] [Bug 110304] Re: Feisty crash possibly related to mdadm/raid5
Wichtigkeit: Niedrig

G.Koehler. are you able to catch any portion of the kernel crash? It will be hard to move forward with this bug without some way to reproduce it or see specifically what areas are failing. Thanks!

--
Feisty crash possibly related to mdadm/raid5
https://bugs.launchpad.net/bugs/110304
You received this bug notification because you are a direct subscriber of the bug.

Revision history for this message
Kees Cook (kees) wrote :

I'm going to close this bug for now, since there doesn't seem to be a good way to diagnose (or reproduce) this problem at the moment. Thanks for everyone's input. If a backtrace from the kernel or a series of steps to reproduce this problem become available, please feel free to reopen the bug. Thanks!

Changed in linux-source-2.6.20:
assignee: keescook → nobody
status: Needs Info → Rejected
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.