Heavy samba usage causes kernel OOPs

Bug #175643 reported by Ryan T. Sammartino
6
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Unknown
Unassigned
linux-source-2.6.22 (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Running a fresh install of Ubuntu Server 7.10

Heavy samba usage will cause the following OOPs very easily... usually within 5 minutes of heavy activity:

Dec 11 17:26:08 ps3-build kernel: [ 723.552774] BUG: unable to handle kernel paging request at virtual address 7268e60c
Dec 11 17:26:08 ps3-build kernel: [ 723.552977] printing eip:
Dec 11 17:26:08 ps3-build kernel: [ 723.553069] c017cb50
Dec 11 17:26:08 ps3-build kernel: [ 723.553070] *pdpt = 0000000033c6a001
Dec 11 17:26:08 ps3-build kernel: [ 723.553167] *pde = 0000000000000000
Dec 11 17:26:08 ps3-build kernel: [ 723.553265] Oops: 0002 [#1]
Dec 11 17:26:08 ps3-build kernel: [ 723.553357] SMP
Dec 11 17:26:08 ps3-build kernel: [ 723.553509] Modules linked in: ipv6 af_packet sbp2 parport_pc lp parport loop snd_hda_intel s
nd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_d
evice snd xpad heci soundcore psmouse serio_raw shpchp pci_hotplug pcspkr e1000_ich9 evdev snd_page_alloc intel_agp agpgart sr_mod
 cdrom usbhid hid ext3 jbd mbcache sg sd_mod pata_marvell ohci1394 ieee1394 sata_sil ata_piix ata_generic libata scsi_mod uhci_hcd
 ehci_hcd usbcore dm_mirror dm_snapshot dm_mod thermal processor fan fuse apparmor commoncap
Dec 11 17:26:08 ps3-build kernel: [ 723.556245] CPU: 2
Dec 11 17:26:08 ps3-build kernel: [ 723.556245] EIP: 0060:[add_partial+32/64] Not tainted VLI
Dec 11 17:26:08 ps3-build kernel: [ 723.556246] EFLAGS: 00210002 (2.6.22-14-server #1)
Dec 11 17:26:08 ps3-build kernel: [ 723.556550] EIP is at add_partial+0x20/0x40
Dec 11 17:26:08 ps3-build kernel: [ 723.556652] eax: c166f258 ebx: c0362b34 ecx: c0362b40 edx: 7268e608
Dec 11 17:26:08 ps3-build kernel: [ 723.556769] esi: c166f240 edi: c0362b20 ebp: f89a1d0b esp: f01cfe78
Dec 11 17:26:08 ps3-build kernel: [ 723.556886] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Dec 11 17:26:08 ps3-build kernel: [ 723.556998] Process smbd (pid: 5551, ti=f01ce000 task=f52e54c0 task.ti=f01ce000)
Dec 11 17:26:08 ps3-build kernel: [ 723.557119] Stack: f3792a80 c166f240 c017d871 f3e0c1de f3e0c1c0 f7d87000 f89a1df9 3db5ef46
Dec 11 17:26:08 ps3-build kernel: [ 723.557581] c166f240 00200287 c0362b20 f5bba260 c017e31e f89a1d0b f3792a88 00000000
Dec 11 17:26:08 ps3-build kernel: [ 723.558043] f0578148 f89a1d0b f5bba260 314b68cf 00000000 f1121480 f89a2031 f3e0c1c0
Dec 11 17:26:08 ps3-build kernel: [ 723.558505] Call Trace:
Dec 11 17:26:08 ps3-build kernel: [ 723.558673] [__slab_free+273/672] __slab_free+0x111/0x2a0
Dec 11 17:26:08 ps3-build kernel: [ 723.558812] [<f89a1df9>] call_filldir+0x89/0xe0 [ext3]
Dec 11 17:26:08 ps3-build kernel: [ 723.558958] [kfree+126/192] kfree+0x7e/0xc0
Dec 11 17:26:08 ps3-build kernel: [ 723.559091] [<f89a1d0b>] free_rb_tree_fname+0x3b/0x80 [ext3]
Dec 11 17:26:08 ps3-build kernel: [ 723.559239] [<f89a1d0b>] free_rb_tree_fname+0x3b/0x80 [ext3]
Dec 11 17:26:08 ps3-build kernel: [ 723.559387] [<f89a2031>] ext3_readdir+0xf1/0x650 [ext3]
Dec 11 17:26:08 ps3-build kernel: [ 723.559531] [cp_new_stat64+249/272] cp_new_stat64+0xf9/0x110
Dec 11 17:26:08 ps3-build kernel: [ 723.559670] [filldir64+0/224] filldir64+0x0/0xe0
Dec 11 17:26:08 ps3-build kernel: [ 723.559808] [current_fs_time+65/80] current_fs_time+0x41/0x50
Dec 11 17:26:08 ps3-build kernel: [ 723.559948] [filldir64+0/224] filldir64+0x0/0xe0
Dec 11 17:26:08 ps3-build kernel: [ 723.560083] [filldir64+0/224] filldir64+0x0/0xe0
Dec 11 17:26:08 ps3-build kernel: [ 723.560217] [vfs_readdir+148/176] vfs_readdir+0x94/0xb0
Dec 11 17:26:08 ps3-build kernel: [ 723.560354] [sys_getdents64+111/192] sys_getdents64+0x6f/0xc0
Dec 11 17:26:08 ps3-build kernel: [ 723.560494] [sysenter_past_esp+107/161] sysenter_past_esp+0x6b/0xa1
Dec 11 17:26:08 ps3-build kernel: [ 723.560636] =======================
Dec 11 17:26:08 ps3-build kernel: [ 723.560733] Code: b6 00 00 00 00 8d bf 00 00 00 00 83 ec 08 89 1c 24 89 c3 89 74 24 04 89 d6
e8 3d f1 17 00 8b 53 0c 8d 46 18 8d 4b 0c 83 43 04 01 <89> 42 04 89 56 18 89 48 04 89 43 0c b8 01 00 00 00 86 03 8b 1c
Dec 11 17:26:08 ps3-build kernel: [ 723.563115] EIP: [add_partial+32/64] add_partial+0x20/0x40 SS:ESP 0068:f01cfe78

Here are the relevent versions I have installed:

samba 3.0.26a-1ubuntu2.2

Linux ps3-build 2.6.22-14-server #1 SMP Sun Oct 14 23:34:23 GMT 2007 i686 GNU/Linux

1) Share your home directory with a WinXP machine. Say your home directory is mounted as Z: on the WinXP machine.
2) Sync a very large Perforce depot on to Z:. This is usually enough.
3) If the sync succeeds, build a very large Visual Studio project via Z:. This will certainly kill it.

Now, I had a similar problem with Fedora Core 8 (see https://bugzilla.redhat.com/show_bug.cgi?id=419911), so this is either:

- a common kernel bug
- a hardware issue

The hardware is a brand new quad core machine.

Revision history for this message
Ryan T. Sammartino (ryan-sammartino) wrote :
Revision history for this message
Ryan T. Sammartino (ryan-sammartino) wrote :

Sorry, hit commit too soon.

The commonalities with that linux-kernel bug report are:

- We both have SMP systems
- We both have Marvel controllers:

$ sudo lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 02)
00:01.0 PCI bridge: Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root Port (rev 02)
00:03.0 Communication controller: Intel Corporation 82G33/G31/P35/P31 Express MEI Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82801I (ICH9 Family) Gigabit Ethernet Controller (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 4 port SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
00:1f.5 IDE interface: Intel Corporation 82801I (ICH9 Family) 2 port SATA IDE Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation G80 [GeForce 8800 GTX] (rev a2)
03:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101 single-port PATA133 interface (rev b2)
07:02.0 RAID bus controller: Silicon Image, Inc. Adaptec AAR-1210SA SATA HostRAID Controller (rev 02)
07:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)

But I'm not using RAID or reisferfs.

Revision history for this message
Ryan T. Sammartino (ryan-sammartino) wrote :

Downgrading to 7.04 and it took 35 minutes to get this:

Dec 12 15:33:03 ps3-build kernel: [ 3503.904521] Bad page state in process 'smbd'
Dec 12 15:33:03 ps3-build kernel: [ 3503.904523] page:c15448e0 flags:0x40000004 mapping:00000000 mapcount:-524288 count:0
Dec 12 15:33:03 ps3-build kernel: [ 3503.904524] Trying to fix it up, but a reboot is needed
Dec 12 15:33:03 ps3-build kernel: [ 3503.904524] Backtrace:
Dec 12 15:33:03 ps3-build kernel: [ 3503.905008] [bad_page+106/176] bad_page+0x6a/0xb0
Dec 12 15:33:03 ps3-build kernel: [ 3503.905016] [free_hot_cold_page+355/368] free_hot_cold_page+0x163/0x170
Dec 12 15:33:03 ps3-build kernel: [ 3503.905020] [try_to_free_buffers+83/144] try_to_free_buffers+0x53/0x90
Dec 12 15:33:03 ps3-build kernel: [ 3503.905028] [__pagevec_free+31/48] __pagevec_free+0x1f/0x30
Dec 12 15:33:03 ps3-build kernel: [ 3503.905033] [release_pages+105/368] release_pages+0x69/0x170
Dec 12 15:33:03 ps3-build kernel: [ 3503.905041] [__pagevec_release+21/32] __pagevec_release+0x15/0x20
Dec 12 15:33:03 ps3-build kernel: [ 3503.905045] [truncate_inode_pages_range+509/736] truncate_inode_pages_range+0x1fd/0x2e0
Dec 12 15:33:03 ps3-build kernel: [ 3503.905051] [<f8adba8b>] xfs_vn_unlink+0x3b/0x60 [xfs]
Dec 12 15:33:03 ps3-build kernel: [ 3503.905077] [truncate_inode_pages+23/32] truncate_inode_pages+0x17/0x20
Dec 12 15:33:03 ps3-build kernel: [ 3503.905082] [generic_delete_inode+261/288] generic_delete_inode+0x105/0x120
Dec 12 15:33:03 ps3-build kernel: [ 3503.905087] [iput+92/112] iput+0x5c/0x70
Dec 12 15:33:03 ps3-build kernel: [ 3503.905090] [do_unlinkat+239/336] do_unlinkat+0xef/0x150
Dec 12 15:33:03 ps3-build kernel: [ 3503.905099] [sysenter_past_esp+105/157] sysenter_past_esp+0x69/0x9d
Dec 12 15:33:03 ps3-build kernel: [ 3503.905105] =======================

This is with 2.6.20-16-server and samba 3.0.24-2ubuntu1.4

Revision history for this message
Ryan T. Sammartino (ryan-sammartino) wrote :

Looks like other people with the same ICH9 chipset are having trouble:

http://<email address hidden>/msg236612.html

Revision history for this message
Ryan T. Sammartino (ryan-sammartino) wrote :

Here is a list of things I've tried to get around this... none helped:

- Unplugged discs from the motherboard and plugged them into the Adaptec controller
- Installed a different NIC (Intel 82541PI Gigabit), shutting off the one on the motherboard
- Switched all partitions from xfs to ext3

Any ideas on how to get this thing stable are greatly appreciated.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Hardy Heron Alpha2 release will be coming out soon. It will have an updated version of the kernel. It would be great if you could test with this new release and verify if this issue still exists. I'll be sure to update this report when Alpha2 is available. Thanks!

Changed in linux:
status: New → Incomplete
Revision history for this message
Ryan T. Sammartino (ryan-sammartino) wrote :

I tried the Hardy kernel linux-image-2.6.24-1-server and it didn't help.

Unfortunately, since I was unable to get a stable configuration working, the IT department have taken the machine away so I no longer have access to it.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Ryan,

Sorry we could get some resolution to your bug quicker. However, since you no longer have access to the machine I'm going to close this report for now. Please continue to report any future bugs you may find, we really do appreciate it. Thanks!

Changed in linux:
importance: Undecided → Unknown
status: Incomplete → Invalid
Changed in linux-source-2.6.22:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.