AFS gets stuck in an infinite loop, 100% cpu, can't kill -9

Bug #199420 reported by Soren Hansen
34
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I've seen this a few times, but have not been able to pinpoint the exact problem, but I think it's better to have bug report to help track it..

The issue is that in certain cases, I end up with a process that's in Running state, seemingly running at 100% in userspace, but can't be kill -9'ed. It seems to occur mostly when a process exhausts certain ressources..

I've seen it happen in a VM (using kvm) with 128MB RAM using the alternate installer, where localedef ends up in this state (I'm not entirely sure, but I suspect that it eats more RAM than is available). The server install is unaffected (fewer applications installed, so localedef hasn't as much to do, I suppose).

I've also seen the same behaviour happen while trying to build java inside an sbuild where it ran out of disk space. I have a suspicion that that particular case might be more related to schroot pulling away the snapshot lv from underneat the process, but I figured I'd mention it anyway in case they might be related.

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
Ike Panhc (ikepanhc) wrote : Re: Running process cannot be kill -9'ed

If a process is accessing some I/O device, you can not force to close it unless the I/O command is over.

Could you point out what kind of resource the process using? So that we can tell if the process is in disk sleep status

Changed in linux (Ubuntu):
importance: Medium → Low
status: Triaged → Incomplete
Revision history for this message
kernel-janitor (kernel-janitor) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Soren Hansen (soren) wrote :

As I said: It's in Running state, so it's not doing I/O. It's running entirely in user space and should be killable, but isn't. Reopening.

Changed in linux (Ubuntu):
status: Invalid → Triaged
Yani Raafezaj (ytraaf)
Changed in linux (Ubuntu):
status: Triaged → New
Revision history for this message
Yani Raafezaj (ytraaf) wrote :

I'm pretty sure this is related...

As of 12 April 2010 I'm having this same issue of immortal processes. I had a program (EditiX) running and it suddenly stopped responding. I tried to kill it using the process manager, but I had two problems:

1) The process, nor anything resembling the name, was not on the list. Oddly enough I could still move the window around and click on buttons but they wouldn't do anything.
2) I found a completely unrelated process--Adobe Reader--whose status was listed as "Zombie." I tried to kill it with the process manager, but it did nothing. I also tried kill -9 <pid> but that didn't do anything either, nor did pkill.

System information:
Ubuntu 9.04 (Jaunty) release 5.0
GNOME 2.26.1
Kernel: 2.6.28-18-generic
Troublemaker processes: Adobe Reader (acroread) and EditiX 2009.

Revision history for this message
Leo (leorolla) wrote :

Diego, if you can reproduce the problem reported by someone else, you can set the bug to confirmed.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
lxop (lxop) wrote :

I'm on Debian, but I've just encountered this. I'm running du, and it is just sitting there in R state, using 100% CPU:

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12047 alex 39 19 5656 496 408 R 99.9 0.0 219:12.08 du

kill -9 doesn't work, kill -STOP doesn't stop it. Can't attach to it with gdb.

Debian Unstable
Kernel 3.2.0-3-rt-amd64
coreutils version 8.13-3

Revision history for this message
Alexei Colin (alexei.colin) wrote :

You can do Alt-SysRq-l to dump a kernel stack trace to kernel log (dmesg). See <kernel-source>/Documentation/sysrq.txt.

When I did this on my machine with 'firefox' in the same state: it revealed try_to_wake_up/afs_cv_wait in openafs module. Not completely sure, but probably spinning somewhere there in the networked file system code.

Revision history for this message
Phillip Susi (psusi) wrote :

Can you provide the full alt-sysrq-l stacktrace?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Phillip Susi (psusi) wrote :

And firefox is the run away process? Can you show a ps -l on it?

Revision history for this message
Alexei Colin (alexei.colin) wrote :

Yes, firefox was the unkillable process in that case. I did save a tiny shell log at the time:

acolin@thinkpad ~$ ps aux | grep firefox
acolin 3904 51.4 8.9 774456 276596 ? Rs 18:06 18:20 /usr/lib/firefox/firefox
acolin 5689 0.0 0.0 4392 808 pts/3 S+ 18:42 0:00 grep firefox
acolin@thinkpad ~$ kill -9 3904
acolin@thinkpad ~$ ps aux | grep firefox
acolin 3904 51.6 8.9 774456 276596 ? Ss 18:06 18:30 /usr/lib/firefox/firefox
acolin 5693 0.0 0.0 4392 808 pts/3 S+ 18:42 0:00 grep firefox

Sadly, I don't have any more information and don't have any hopes of reproducing this, but hopefully the stack trace shed's light on possible causes.

Revision history for this message
Phillip Susi (psusi) wrote :

That should be helpful, could you also describe your AFS setup a bit?

Changed in linux (Ubuntu):
importance: Low → Medium
status: Incomplete → Triaged
summary: - Running process cannot be kill -9'ed
+ AFS gets stuck in an infinite loop, 100% cpu, can't kill -9
Revision history for this message
penalvch (penalvch) wrote :

Soren Hansen, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11-rc7

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

tags: added: needs-kernel-logs needs-upstream-testing
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.