NFS load locks processes and mounts

Bug #688437 reported by Bill M
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I have multiple Onstor NFS NASs with multiple Ubuntu clients. The clients are all LTS releases (Dapper, Hardy and now Lucid). The clients are running autofs to mount the exports from the NAS. On my new Lucid machine, high NFS traffic generated by the client makes the mounts lock up.

- Processes doing NFS file operations lock up (State D). Can't SIGTERM, SIGHUP, SIGKILL the affected process.
- The affected mount can't be umounted unless I use umount -fl /mountpoint
- Once the mount is killed I can SIGKILL the processes.
- The process performing IO on the affected mount never dies or exits to shell. (zombies)
- Other processes performing IO on other mounts exit normally once the affected mount is unmounted
- Other clients are unaffected by this condition
- Once a mount is affected by this condition it is no longer mountable by either autofs or manual mount
- Other mounts on that same NAS are no longer mountable as well.
- I can ping and showmount -e and the affected NAS
- Have to echo b > /proc/sysrq-trigger in order to reboot.

This happens on every Lucid server kernel I've tried (even the current mainline kernel).

To reproduce:
1. Setup automounter on 4 different shares on 4 different servers.
2. cd in to those 4 shares and run iozone -a
3. wait for the processes to stop.

/proc/mounts:
vsvr-4.nfs:/slow08 /mnt/slow/vol08 nfs rw,nosuid,noatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,posix,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.10.233,mountvers=3,mountproto=tcp,addr=192.168.10.233 0 0

/etc/auto.slow:
vol08 -rw,nosuid,posix,proto=tcp,vers=3,rsize=32768,wsize=32768,hard,intr,noatime,timeo=600 vsvr-4.nfs:/slow08

/etc/auto.master:
/mnt/slow /etc/auto.slow -nosuid

showmount -e vsvr-5.nfs
Export list for vsvr-5.nfs:
/slow11 *
/slow15 *

rpcinfo -p vsvr-5.nfs
   program vers proto port
    100000 2 udp 111 portmapper
    100000 2 tcp 111 portmapper
    100003 2 udp 2049 nfs
    100003 2 tcp 2049 nfs
    100003 3 udp 2049 nfs
    100003 3 tcp 2049 nfs
    100005 1 udp 2087 mountd
    100005 1 tcp 2087 mountd
    100005 2 udp 2087 mountd
    100005 2 tcp 2087 mountd
    100005 3 udp 2087 mountd
    100005 3 tcp 2087 mountd
    100021 1 udp 2090 nlockmgr
    100021 1 tcp 2090 nlockmgr
    100021 3 udp 2090 nlockmgr
    100021 3 tcp 2090 nlockmgr
    100021 4 udp 2090 nlockmgr
    100021 4 tcp 2090 nlockmgr
    100024 1 udp 2092 status
    100024 1 tcp 2092 status
    100333 1 udp 2049
    100333 1 tcp 2049
    100333 2 udp 2049
    100333 2 tcp 2049

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-26-virtual 2.6.32-26.48 [modified: lib/modules/2.6.32-26-server/modules.seriomap lib/modules/2.6.32-26-server/modules.pcimap lib/modules/2.6.32-26-server/modules.alias lib/modules/2.6.32-26-server/modules.dep lib/modules/2.6.32-26-server/modules.alias.bin lib/modules/2.6.32-26-server/modules.symbols lib/modules/2.6.32-26-server/modules.isapnpmap]
Regression: No
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-26.48-server 2.6.32.24+drm33.11
Uname: Linux 2.6.32-26-server x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
Date: Fri Dec 10 00:14:16 2010
InstallationMedia: Ubuntu-Server 10.04 LTS "Lucid Lynx" - Release amd64 (20100427)
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: VMware, Inc. VMware Virtual Platform
PciMultimedia:

ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-26-server root=UUID=f8dabf9f-1200-495e-bc07-25752be8d3e1 ro ipv6.disable=1 quiet
ProcEnviron:
 LANGUAGE=en
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux
dmi.bios.date: 03/19/2009
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd03/19/2009:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.

Revision history for this message
Bill M (billmoritz) wrote :
Revision history for this message
Bill M (billmoritz) wrote :

Anyone? This bug was submitted a month ago.

tags: added: kj-triage
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: lucid
Revision history for this message
penalvch (penalvch) wrote :

Bill M, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command in the development release from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux <replace-with-bug-number>

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please do not test the kernel in the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. As well, please comment on which kernel version specifically you tested.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream', and comment as to why specifically you were unable to test it.

Please let us know your results. Thanks in advance.

Helpful Bug Reporting Links:
https://help.ubuntu.com/community/ReportingBugs#Bug_Reporting_Etiquette
https://help.ubuntu.com/community/ReportingBugs#A3._Make_sure_the_bug_hasn.27t_already_been_reported
https://help.ubuntu.com/community/ReportingBugs#Adding_Apport_Debug_Information_to_an_Existing_Launchpad_Bug
https://help.ubuntu.com/community/ReportingBugs#Adding_Additional_Attachments_to_an_Existing_Launchpad_Bug

tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.