PulseAudio gets killed mysteriously on RT kernels

Bug #367671 reported by Josh Green
42
This bug affects 7 people
Affects Status Importance Assigned to Milestone
linux-rt (Ubuntu)
Confirmed
Undecided
Unassigned
pulseaudio (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Binary package hint: pulseaudio

I'm using Ubuntu Studio 9.04 and have enabled PulseAudio (0.9.14-0ubuntu20) with real time enabled. While playing music I noticed that the audio output would suddenly stop, though Amarok kept on as if everything was fine. I noticed that pulseaudio was no longer running, when running it from the command line I observed again the same behavior and the last output line from PulseAudio was "Killed". By attaching strace with -e trace=signal I found that it would occasionally get a SIGXCPU signal. On the 3rd signal, pulseaudio gets killed, presumably by the Linux kernel.

By setting "no-cpu-limit = yes" in the /etc/pulse/daemon.conf file, it got rid of the SIGXCPU signals, which didn't seem to be the problem, since it is still getting a SIGKILL signal. One additional note is that I changed the resample-method to speex-float-1 from the default of src-linear. I'm not experiencing any perceptible system lockup when this occurs, so I don't think the RT PulseAudio is taking up extended amounts of CPU. I'm at a loss as to where the SIGKILL is coming from.

Revision history for this message
Josh Green (josh-resonance) wrote :
Revision history for this message
Daniel T Chen (crimsun) wrote : Re: [Bug 367671] Re: PulseAudio gets killed by SIGXCPU

No, this symptom is really a by-product of bug 345627. See the kernels
at http://kernel.ubuntu.com/~dtchen/

On Sun, Apr 26, 2009 at 8:26 PM, Josh Green <email address hidden> wrote:
> PulseAudio gets killed by SIGXCPU

Revision history for this message
Josh Green (josh-resonance) wrote : Re: PulseAudio gets killed by SIGXCPU

I spoke too soon in regards to "no-cpu-limit = yes" fixing the issue. Its not fixed and is still occurring on my system. Its no longer receiving SIGXCPU signals, but it does still get a SIGKILL, the source of which I am still unsure of.

Daniel T Chen: How can you be sure its a bi-product of bug #345627 ? The description of that bug sounds nothing like what I am experiencing. The audio output is totally fine on my system (no crackles or pops). Perhaps you posted the wrong bug link?

Bug #366708 seems like it could be the same, thought that could just be a PulseAudio crash. In this case, its not crashing, its getting killed.

description: updated
summary: - PulseAudio gets killed by SIGXCPU
+ PulseAudio gets killed mysteriously
Revision history for this message
Daniel T Chen (crimsun) wrote : Re: [Bug 367671] Re: PulseAudio gets killed by SIGXCPU

The underlying issue is that ALSA buffer handling was causing PA to spin.
Those patches go quite some ways toward addressing that cause.

In other words, multiple symptoms are caused by the same root culprit.

On Apr 26, 2009 10:15 PM, "Josh Green" <email address hidden> wrote:

I spoke too soon in regards to "no-cpu-limit = yes" fixing the issue.
Its not fixed and is still occurring on my system. Its no longer
receiving SIGXCPU signals, but it does still get a SIGKILL, the source
of which I am still unsure of.

Daniel T Chen: How can you be sure its a bi-product of bug #345627 ?
The description of that bug sounds nothing like what I am experiencing.
The audio output is totally fine on my system (no crackles or pops).
Perhaps you posted the wrong bug link?

Bug #366708 seems like it could be the same, thought that could just be
a PulseAudio crash. In this case, its not crashing, its getting killed.

-- PulseAudio gets killed by SIGXCPU
https://bugs.launchpad.net/bugs/367671You received this bug ...

Revision history for this message
Josh Green (josh-resonance) wrote : Re: PulseAudio gets killed mysteriously

I'm still unsure why that would cause PulseAudio to get killed though. Is there some sort of RT process killer running on Ubuntu Studio?

I should probably try the regular "generic" kernel in addition to yours, since just trying yours would be inconclusive, even if it worked, since I'm running the -rt kernel at the moment.

Thanks for the info.

Revision history for this message
Josh Green (josh-resonance) wrote :

I installed the 2.6.28-11-generic Kernel, rather than the Ubuntu Studio RT Kernel, and so far PulseAudio has been running fine. So I think it is something specific with the RT Kernel. I haven't yet found a concise description of the differences between the two kernel builds though.

Revision history for this message
Josh Green (josh-resonance) wrote :

Daniel T Chen:
After additional testing I experienced a complete system lockup with the stock Ubuntu 9.04 generic kernel. I suspect it was because I had the PulseAudio option "no-cpu-limit = yes" enabled with SCHED_FIFO RT as well and it got in a loop. So this is starting to sound exactly as you describe. I have been testing your posted kernel for a couple days now and its working great. Sorry for doubting you! ;-) It seems the -rt kernel has some sort of SCHED_FIFO lockup protection though which was killing the PulseAudio task when it got too CPU hungry. I'm curious what particular option this is in the Kernel.

At any rate, this bug can probably just be marked as a duplicate and closed.

Revision history for this message
martron (imartron) wrote :

I was having the same problem as Josh. Also with RT kernel. I installed the proposed generic kernel linked at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/345627/ and that's fixed my pulseaudio dying problem. Even works well with jack and firewire.

Daniel T Chen (crimsun)
affects: pulseaudio (Ubuntu) → linux (Ubuntu)
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Josh,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/lucid.

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 367671

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kernel-sound
tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
DaveF (dfriberg23) wrote :

I'm still encountering this issue, exactly as described by Josh, after my upgrade to Ubuntu 10.4. I'm running kernel linux-image-2.6.31-10-rt.

bojo42 (bojo42)
summary: - PulseAudio gets killed mysteriously
+ PulseAudio gets killed mysteriously on RT kernels
affects: linux (Ubuntu) → linux-rt (Ubuntu)
Changed in linux-rt (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
bojo42 (bojo42) wrote :

yep still present on lucid. i also did packages some packages for a 2.6.33 based rt kernel (https://launchpad.net/~bojo42/+archive/rt) to have support for rtkit ( bug #406702 ) but that doesn't help, even with the latest rt19 patchset. rest of the system is at stock configuration.

Revision history for this message
bojo42 (bojo42) wrote :

i have good news: i managed to prevent the killing of PA on lucid with an RT kernel. i did it by playing around with the /etc/pulse/daemon.conf:

cpu-limit = no -> PA won't kill itself

rlimit-rttime = 10000000 -> increased the value by adding a zero (x10) -> prevents the kernel from killing PA

i'm not sure how the last value is handled (describes the time a application is allowed for full rt priority), because AFAIK a app shouldn't be allowed to set it on it's own. but probably rtkit comes into play here. from more details on my config see the attachment.

would be great if someone can confirm this fix, especially if it also works on the stock RT kernel (2.6.31) that lacks a patch for rtkit.

Changed in pulseaudio (Ubuntu):
status: New → Confirmed
Revision history for this message
Daniel T Chen (crimsun) wrote :

bojo42, cpu-limit is already disabled by default in Lucid. The rttime rlimit won't be changed; that's a pretty nasty thing to unleash on users for an LTS, and there's a fairly easy to perform workaround.

Changed in pulseaudio (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
ChristieGrinham (christiegrinham) wrote :

This is affecting me in Lucid Lynx with RT kernel. It is very frustrating because if I am trying to watch a film it cuts out every now and then and I have to restart the movie player.

Revision history for this message
PrototypeX29A (preineke) wrote :

So what is the bug that won't be fixed and what is the easy to perform workaround?

Revision history for this message
Robert Persson (ireneshusband) wrote :

Bump. What is the workaround?

Revision history for this message
bojo42 (bojo42) wrote :

just like i said in comment #12 edit /etc/pulse/daemon.conf and change for example
; rlimit-rttime = 1000000
to
rlimit-rttime = 10000000

Revision history for this message
Bernard Hurley (bernard-marcade) wrote :

bojo42's solution seems to work. But may I suggest you copy /etc/pulse/daemon.conf to ~/.pulse/daemon.conf and then edit that file, which will then take precedence over the system wide file.

Revision history for this message
bojo42 (bojo42) wrote :

@Bernard: good point, but on multi user machines you have to do that for every user and pulse must not be started system wide. i like /etc because to me such basic system administration belongs there ;)

@all: could you please report if it completely stops the killing on your side, as i noticed that with my newer kernels i sometimes still got PA killed, but very rare. and can some confirm that it works with the 2.6.31 kernel from universe.

Revision history for this message
F. Medeiros (excalibas-gmail) wrote :

I have edited /etc/pulse/daemon.conf but the problem still remains

With Ubuntu Lucid + Kernel 2.6.33-29-realtime

Revision history for this message
bojo42 (bojo42) wrote :

as i said before it doesn't stop completely, but at least it's really less often. please check if you got some different settings in ~/.pulse/daemon.conf.

i am also testing maverick's pulseaudio backported to lucid and it seems that i at least can't remember a killed PA for the last week.

Revision history for this message
delete (eskei-one) wrote :

i tried bojo42's workaround while running the rt kernel found on his ppa with lucid - no joy. the problem is indeed more scarce but still mp3 playback is not acceptable. i also compiled pulseaudio 0.9.22 from source, still no joy. would installing the maverick package make any difference? currently the only workaround i've found is connect directly to jack, some applications (e.g. rhythmbox) can only connect to jack through pulseaudio though.

Revision history for this message
bojo42 (bojo42) wrote :

@delete: I tried backported PA and rtkit, but it doesn't make a difference for me. But as i said before PA get killed very rarely since using the rlimit-rttime config stuff. Could you describe how often PA gets killed on your hardware? What you can also do is to try the latest PA config tweaking from Bug #496616 regarding the resample method or try some other config things.

What we need to do in general is to really dig further into what's exactly killing PA or if it's just PA that is crashing. We also need to find out what's the impact of realtime usage, what's related to Bug #496616 and what role hardware and drivers combinations play.

Revision history for this message
danmb (danmbox) wrote :

Bug #496616 is marked as a duplicate of bug #644644...

Revision history for this message
danmb (danmbox) wrote :

This is hard to diagnose. I built 0.9.22 packages from natty, then installed pulseaudio-dbg_0.9.22+stable-queue-24-g67d18-0ubuntu3_i386.deb

In ~/.pulse/daemon.conf, I enabled
log-level = debug
log-meta = yes
log-time = yes
log-backtrace = 1

then ran pulseaudio under gdb with

$ killall pulseaudio; rm ~/.pulse-cookie; gdb --args pulseaudio --log-target=stderr

[...]

(1533.966| 36.781) D: [modules/alsa/alsa-sink.c:1496 thread_func()] Wakeup from ALSA! (libpulsecommon-0.9.22.so(pa_log_levelv_meta+0x5e9) [0x1cda59])
(1572.048| 38.082) D: [modules/alsa/alsa-sink.c:1496 thread_func()] Wakeup from ALSA! (libpulsecommon-0.9.22.so(pa_log_levelv_meta+0x5e9) [0x1cda59])
[Thread 0xb7355b70 (LWP 24417) exited]
[Thread 0xb7b68b70 (LWP 24416) exited]

Program terminated with signal SIGKILL, Killed.

So, it's impossible to tell what's happening.

Revision history for this message
Daniel T Chen (crimsun) wrote : Re: [Bug 367671] Re: PulseAudio gets killed mysteriously on RT kernels

pulseaudio -vvv

See also https://wiki.ubuntu.com/PulseAudio/Log.

Revision history for this message
danmb (danmbox) wrote :

Doesn't the log-level = debug take care of that, Daniel?

I didn't attach a full log because minutes pass between the last printed message and the SIGKILL.

Is there a way to see how much "rttime" a task has been using (with ps or something else)?

And (less likely) is there a way to configure the kernel to send something like SIGUSR1 instead of SIGKILL and/or log the action and its cause?

Revision history for this message
danmb (danmbox) wrote :

From setrlimit(2): setrlimit allows the user to specify two limits -- a soft one and a hard one. For RLIMIT_RTTIME, upon exceeding the soft limit, the kernel sends a SIGXCPU every second, then a SIGKILL.

Unfortunately, pulseaudio sets the soft limit equal to the hard limit, so no warning is possible.

I'm going to change the code to make the soft limit smaller, then run pulse under gdb and get a backtrace when it receives SIGXCPU.

Revision history for this message
David Henningsson (diwic) wrote :

@Dan, since an improved version is in Natty it might make sense to see if upgrading to that version of PulseAudio and/or GStreamer helps to resolve the issue.

Revision history for this message
David Henningsson (diwic) wrote :

Sorry, did not read comment #25 good enough, however, the comment about upgrading GStreamer still applies.

Revision history for this message
danmb (danmbox) wrote :

Thanks David. I am streaming from qmmp, and I have completely removed gstreamer0.10-pulseaudio (and, yes, as per comment #25, I have a backported natty pulseaudio).

I have performed the steps I planned in #28 and have obtained several backtraces. They all look the same. Strangely, the kernel sends SIGXCPU (due to real-time CPU usage reaching the soft RLIMIT_RTTIME) in a write() call. The setrlimit(2) man page says that blocking calls reset the RTTIME count.

#0 0x0012d422 in ?? ()
#1 0x00475edb in write () at ../sysdeps/unix/syscall-template.S:82
#2 0x00153a55 in pa_fdsem_post (f=0x806e178) at pulsecore/fdsem.c:205
#3 0x0013cb0f in push (l=0x8071bc8, p=0x8, wait_op=false) at pulsecore/asyncq.c:161
#4 0x0013d291 in pa_asyncq_post (l=0x8071bc8, p=0x807a620) at pulsecore/asyncq.c:203
#5 0x0013c1ee in pa_asyncmsgq_post (a=0x8071a90, object=0x80b3ac8, code=7, userdata=0x0,
    offset=0, chunk=0xbffff570, free_cb=0) at pulsecore/asyncmsgq.c:139
#6 0x009811d7 in pstream_memblock_callback (p=0x80864d8, channel=0, offset=0,
    seek=PA_SEEK_RELATIVE, chunk=0xbffff570, userdata=0x8092228)
    at pulsecore/protocol-native.c:4445
#7 0x001dd20b in ?? () from /usr/lib/libpulsecommon-0.9.22.so
#8 0x001c823e in ?? () from /usr/lib/libpulsecommon-0.9.22.so
#9 0x0021b5fb in pa_mainloop_dispatch () from /usr/lib/libpulse.so.0
#10 0x0021bb11 in pa_mainloop_iterate () from /usr/lib/libpulse.so.0
#11 0x0021bbd4 in pa_mainloop_run () from /usr/lib/libpulse.so.0
#12 0x08052e85 in main (argc=1, argv=0xbffff834) at daemon/main.c:974

Patch attached.

Revision history for this message
David Henningsson (diwic) wrote :

> Strangely, the kernel sends SIGXCPU (due to real-time CPU usage reaching the soft RLIMIT_RTTIME) in a write() call. The setrlimit(2) man page says that blocking calls reset the RTTIME count.

Ok, thanks for the investigation - perhaps that write call is not considered a blocking call, or not so on RT kernels, or it gets stuck in the for (;;) loop around the write call in pa_fdsem_post - might be worth checking the result from that write call to see what goes wrong?

Happy bug hunting :-)

Revision history for this message
danmb (danmbox) wrote :

It's an interrupted syscall -- there is no result... And the for loop only repeats on EINTR.

I take it that no support, or at least help, will be forthcoming from the Ubuntu side?

Even though the lead PA developer seems like a smart guy who understands real-time quite well, I'm sorry to say that PA is a stability and usability nightmare, and it has always been so. See the myriad of guides on removing PA from Ubuntu installs.

I think the problem is that PA developers put a low priority on making the software "just work" for the user. They are, however, quite happy to explain what the "right thing" is, how to contact the non-responsive ALSA developers, how to invest hours upon hours debugging Pulse etc.

Revision history for this message
David Henningsson (diwic) wrote :

> I take it that no support, or at least help, will be forthcoming from the Ubuntu side?

Of course, I cannot speak for the entire Ubuntu community. Ubuntu relies on volunteers such as yourself to help out with all parts of making the distribution. For my own part, I'm following this with great interest, but I'm not running linux-rt and this package has also been removed from maverick/natty.

> PA is a stability and usability nightmare, and it has always been so.

I've been working quite a bit with this during the past few months and hopefully, the PA version shipping with Natty is more stable and usable than ever before!

> I think the problem is that PA developers put a low priority on making the software "just work" for the user.

For the next release, I hope to be able to work a little with "jack detection" ( https://blueprints.launchpad.net/ubuntu/+spec/hwe-o-audio-jack-detection ) to make it even better and more of "just works".

Revision history for this message
tizbac (tizbac2) wrote :

I have the same problem on ubuntu 11.04 with 2.6.38-8-lowlatency , logs are almost useless , no trace of crash , it just says "Killed" on console after about 5 mins

Revision history for this message
David Gomes (davidgomes) wrote :

I have the exact same problem, pulseaudio starts when I login, but crashes after 5 minutes.

If I run it through a console, "pulseaudio", I get:

W: pid.c: Stale PID file, overwriting.
#After a few minutes or sometimes even less than a minute
Killed

I have already made what was suggested in #22, but it still didn't work.

I'm using GNOME, Ubuntu 11.04 64-bits.

Changed in linux-rt (Ubuntu):
status: Confirmed → Incomplete
status: Incomplete → Confirmed
Revision history for this message
danmb (danmbox) wrote :

@David (#36): are you running a RT kernel though? PA tends to get killed for many other reasons, being the stable piece of software that it is...

Revision history for this message
David Gomes (davidgomes) wrote :

@Dan Muresan (#37), may I know how to know whether I am or not using an RT Kernel, as I do not currently know.

Revision history for this message
danmb (danmbox) wrote :

Run

uname -a

on the command line and post the results. You're probably not running a real-time kernel, because those stopped at around 2.6.33. But log the results here anyway.

Revision history for this message
David Gomes (davidgomes) wrote :

uname -a
"Linux DavidPC 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux"

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "soft RLIMIT_RTTIME lower than hard limit" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.