Kernels from 3.8.x to 3.11.x panic on bluetooth DUN disconnect

Bug #1165433 reported by Sergio Callegari
104
This bug affects 20 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
High
Unassigned

Bug Description

Issue is obviously in the kernel that should not panic in any circumnstances.
This bug is seen on quantal using the kernel from PPA mainline. Tested with 3.8.0 to 3.8.6.
Since 3.8.x is going to be the raring kernel I believe that this should definitely be fixed before raring is shipped.

Seen on:

DELL E6500 with kubuntu quantal 12.10 64 bit and as said, kernel 3.8.6 from the mainline ppa.
The machine has a Dell Computer Corp. Wireless 370 Bluetooth Mini-card (connected via an internal usb connection).

The issue is shown when connecting to the internet via a Samsung Galaxy S plus phone, using a bluetooth DUN connection.
It is reproducible every time.

How to reproduce:

1) Use the bluetooth applet to discover the phone and associate to it.
2) Use network manager to setup a DUN connection with the phone through your APN
3) Connect to the internet via bluetooth DUN (connection works perfectly)
4) Disconnect from the network manager.

At the same time you disconnect, the GUI session is terminated and the kernel panics, briefly showing a panic log on the screen.

Note that:

a) The issue is not present using the standard ubuntu quantal kernel
b) The issue is not present using kernels from the mainline ppa before 3.8 (e.g., 3.7.x is fine for all x)
c) The issue is not present when connecting to the internet using a USB mobile dongle (e.g. Huawei usb key)

This looks pretty serious to me: kernel does not sync when panicing and there is a serious risk of data loss; connecting to the internet via a smart phone using bluetooth DUN seems to be something that one should take for granted on any modern OS. Furthermore, points a) and b) above show that this is a *regression* over previous kernels.

affects: bcmwl (Ubuntu) → ubuntu
Revision history for this message
Javier López (javier-lopez) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. This bug did not have a package associated with it, which is important for ensuring that it gets looked at by the proper developers. You can learn more about finding the right package at https://wiki.ubuntu.com/Bugs/FindRightPackage . I have classified this bug as a bug in linux.

When reporting bugs in the future please use apport by using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https://help.ubuntu.com/community/ReportingBugs.

affects: ubuntu → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1165433

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Sergio Callegari (callegar) wrote : Re: Kernel 3.8.x panics on bluetooth DUN disconnect

Cannot attach the apport collect data, since I am not on the machine where I first saw the issue.

However, not only I can confirm the bug, but also report that I am now sure it is not restricted to the DELL E6500 host on which I first saw it.

It is 100% reproducible also on a desktop machine with AMD Phenom II processor, kubuntu quantal 64 bit and an external USB bluetooth dongle, when connecting to the internet via the same Samsung Galaxy S phone, through bluetooth DUN. As before the kernel panic happens on disconnect.

Bluetooth adaptor is a Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.9 kernel[0]. You will need to install both the linux-image and linux-image-extra .deb packages.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc6-raring/

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: needs-bisect raring regression-release
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Sergio Callegari (callegar) wrote :

Yes, the bug is also in the latest kernel. Sorry for the mispell in #4. That was 3.9 RC6!

Revision history for this message
Sergio Callegari (callegar) wrote :

Not fixed in 3.9RC7 either.

tags: added: kernel-bug-exists-upstreamse regression-relea
removed: regression-release
tags: added: kernel-bug-exists-upstream regression-release
removed: kernel-bug-exists-upstreamse regression-relea
Revision history for this message
Nikolaus Waxweiler (madleser) wrote :

Same issue with my Nokia C7 (stock 13.04 x86). One thing that I noticed is that the NetworkManager indicator is having trouble displaying the name of the connection, it looks approximately like this:

---
Nokia C7-00 N
No description
Disconnect
---

Sometimes instead of "No description", it shows nothing or random garbage like a pointer is jumping around memory. Not sure if that helps but I wanted to mention it.

Revision history for this message
Sergio Callegari (callegar) wrote :

Bug is also in the just released upstream 3.8.9... and, as shown by the previous report, in Ubuntu Raring...
Together with bug 1112652 it looks like 3.8.x is a bit of a network killer...

Revision history for this message
philippe-pachot (philippe-pachot) wrote :

I have just upgraded from 12.10 to 13.04 andi have the same issue, kernel panic when disconnecting from DUN bluetooth, on an Asus eeepc 1005ha.

Revision history for this message
Bucho (buchohr-deactivatedaccount) wrote :

I also have this problem on Xubuntu 13.04. When I disconnect the connection, there is a kernel panic. If you try to restart the machine without disconnecting, there is also a kernel panic.

There is a workaround. You click on the bluetooth icon and disable the bluetooth. The connection drops without the kernel panic. If you enable the bluetooth connection again, in the NetworkManager you get some unknown or garbled name for your connection. If you logout and login again, the name of the connection is correct again and you can connect without a problem.

Revision history for this message
Sergio Callegari (callegar) wrote :

3.9.2 still affected.

Why is this bug still marked incomplete rather than confirmed? Marking as confirmed after reports #11 and #10.

Thanks to Ante for the workaround.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Eric Shattow (eshattow) wrote :

Reproducible on linux-image-3.8.0-21-generic. Not yet reproducible on linux-image-3.10.0-031000rc3. Would bissect help?

Revision history for this message
eccerr0r (blc+launchpad) wrote :

I saw this same issue in Gentoo Linux, and also suspecting kernel. I haven't gotten a chance to try newer kernels but 3.8.13-gentoo exhibits this issue 100% of the time.

Since I had to apply a patch (probably same as used for other distros whose NM works with BT out of the box) for Gentoo's Networkmanager to see the bluetooth rfcomm, I figure that this would be tough to get support from Gentoo side (I wonder why this patch hasn't made its way upstream?)

Revision history for this message
eccerr0r (blc+launchpad) wrote :

I just tested the raw linux-3.10-rc5 from kernel.org and it appears to not have this issue.

Revision history for this message
eccerr0r (blc+launchpad) wrote :

Before concluding that it is fixed, I subsequently had my 3.10-rc5 hang hours later. Unsure if it was latent damage caused by the same issue as before or another bug... more testing needed.

Revision history for this message
eccerr0r (blc+launchpad) wrote :

Ok, confirmed, behavior is different in 3.10-rc5 but still not expected result. It crashed, but this time my machine completely hangs within X11 instead of dropping back to KMS console. Complete hang as before.

Revision history for this message
Sergio Callegari (callegar) wrote :

For me, even with 3.9 this is the case. When I try to disconnect from DUN, at times I get thown to the text console with an oops. At times I just see the machine freeze in X11. At times the machine keeps working in X11 for a few seconds (or up to 1 minute), and in this case I can even see the oops in dmesg before the freeze or a crash.

Does this happen on every architecture? Mine is X64, with intel graphics and Dell Computer Corp. Wireless 370 Bluetooth.

If it happens on every possible architecture, it may be worth disabling BT DUN altogether, until someone upstream can look into it, since crashes and freezes may lead to data loss. Upstreams knows this since March, hence either it is very subtle to catch or there is really little interest in BT DUN.

Revision history for this message
Sergio Callegari (callegar) wrote :

Incidentally, it is still there in brand new 3.9.6.

And I had exactly the behavior indicated above.

After disconnection the machine keept running fine for about 1min to completely freeze right after.

Revision history for this message
eccerr0r (blc+launchpad) wrote :

Architecture agnostic for me, it crashes on 3.8.13-gentoo on x86 (32-bit, eeepc 900a, targus USB stick)

Revision history for this message
eccerr0r (blc+launchpad) wrote :

I just reproduced this on my core-i7 x86-64 with the targus USB stick since it has a serial port to do console.

Unfortunately the oops dump is not helpful - it says it oops in metacity. Perhaps this is why it's so difficult to debug...
Added it anyway, this is the first oops that came from my system.

Revision history for this message
eccerr0r (blc+launchpad) wrote :

This is Gentoo built, Linux-3.9.2 (from kernel.org):

This doesn't seem very helpful for me, all it's pointing to is some massive kernel table corruption.

Note that using serial console, the machine still was able to respond to the console but no other i/o was accessable. Every new I/O attempt would generate another oops - the machine is unusable but data can be collected.

Revision history for this message
Sergio Callegari (callegar) wrote :

Can someone please make this quick test?

1) Get out of X (e.g. by logging out the graphical desktop and switching to a virtual terminal)
2) Use nmcli to connect and disconnect from the BT dun connection

Do you still /always/ see the kernel oops?

Revision history for this message
eccerr0r (blc+launchpad) wrote :

Here's another way to trigger the crash:
1. Set up and use rfcomm/btusb as normal.
2. stop bluetooth daemon

It's not necessary to disconnect from networkmanager to trigger this.

This time the first anomaly is a warning in get_work_pool. This time it says pppd, and it was trying to release the connection. I need to set up serial console again but the call trace is (typed by hand):
warn_slowpath_common
warn_slowpath_null
flush_work
? kfree
__cancel_work_timer
cancel_work_sync
tty_ldisc_halt
tty_ldisc_release
tty_release
__fput
____fput
task_work_run
do_notify_resume
int_signal

Less than one second after, I get a full oops - this time in X, and the call trace is pretty much the same as the previously posted oops.

Revision history for this message
eccerr0r (blc+launchpad) wrote :

And once again I need to clarify, this is with Gentoo Linux.
When I mention "stop bluetoothd" it's a lot more than just killing it - I meant /etc/init.d/bluetooth stop from the command line.

This time I got the oops sync'ed and stored in my syslog.

Revision history for this message
eccerr0r (blc+launchpad) wrote :

I wonder if this is the bug we're running into here...

http://marc.info/?l=linux-bluetooth&m=136868678418771&w=2

Will have to study this when I get some time...

Revision history for this message
Sergio Callegari (callegar) wrote :

Looks very likely. But also bad, since rfcomm is among the top users of this and linux BT people knew there was something wrong since http://marc.info/?l=linux-bluetooth&m=136386669411447&w=2, with more info in http://marc.info/?l=linux-bluetooth&m=136537407411019&w=2, or maybe even earlier, e.g. http://marc.info/?l=linux-bluetooth&m=136363425514280&w=2.

Anyway... let's hope linux can get rid of its blue (tooth) screen of death ;-)

Many thanks for finding it.

Revision history for this message
eccerr0r (blc+launchpad) wrote :

This workaround, though the original submitter doesn't think it does much, seems to at least prevent my machine from crashing...

     diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c
    index 6d9e0b2..a4f4fa9 100644
    --- a/drivers/tty/tty_port.c
    +++ b/drivers/tty/tty_port.c
    @@ -140,6 +140,10 @@ EXPORT_SYMBOL(tty_port_destroy);
    static void tty_port_destructor(struct kref *kref)
    {
    struct tty_port *port = container_of(kref, struct tty_port, kref);
    +
    + /* check if last port ref was dropped before tty release */
    + if (WARN_ON(port->itty))
    + return;
    if (port->xmit_buf)
    free_page((unsigned long)port->xmit_buf);
    tty_port_destroy(port);

So far the machine still oopses but no hang at least, bluetooth is completely hosed but it was already hosed anyway...

Revision history for this message
eccerr0r (blc+launchpad) wrote :

whoops credit to LKML Peter Hurley, forgot to give credit to the writer... (though it's not a fix)

Revision history for this message
Sergio Callegari (callegar) wrote :

Unfortunately, since 3.9.8 it is not anymore possible to test mainline kernels on raring.
Won't be easy to say if things get fixed or not upstream.

Revision history for this message
eccerr0r (blc+launchpad) wrote :

I just saw a mail fly by on linux-bluetooth and linux-serial mailing lists that seems to be a patch one person made that alleviates the issue.

I'm not sure of all the archive areas for the mailing list but as I get from the mail list, look for this message with patch attachment:

Date: Sat, 6 Jul 2013 10:43:43 +0200
From: Gianluca Anzolin <email address hidden>
To: <email address hidden>
Cc: <email address hidden>, <email address hidden>, <email address hidden>
Subject: [RFC] PATCH: rfcomm tty refcount fixes

Oleg Pashuta (midihome)
Changed in linux (Ubuntu):
assignee: nobody → Oleg Pashuta (midihome)
assignee: Oleg Pashuta (midihome) → nobody
Revision history for this message
Oleg Pashuta (midihome) wrote :

Ante Bucan (abucan) wrote:
> If you logout and login again, the name of the connection is correct again and you can connect without a problem.

Instead of logout and login I uncheck at "use DUN connection" for my mobile and create connection again to prevent closing other programs.

Revision history for this message
Sergio Callegari (callegar) wrote :

Tested 3.10.2 from the mainline ppa (in spite of the bad experience with 3.9.8 that is broken).

The linux bluetooth screen of death is still there, 127 days since first notification on the LKML for a fully reproducible regression that completely impairs a subsystem and causes a hard kernel crash with a disastrous memory corruption, sigh.

Specifically, the RFCOMM session disconnection fixes that got applied to Linux 3.10 commits (8 off) 24fd642ccb24c8b5732d7d7b5e98277507860b2a to fea7b02fbf73adb2e746f00ed279a782de7e74e4 do not help.

All LKML activity on this stopped on 2013-06-25. From it it looks like kernel developers seem to believe that to reproduce the bug a hard power down of the remote is necessary ('Yes. With power down I meant a hard power down, such that the remote doesn't has the chance to close the session cleanly.'). Looks like this is not the case. A simple bluetooth DUN activation / deactivation cycle from network manager (targeting a Galaxy S mobile phone that is never switched off) seems to trigger the bug and the kernel crash here. They may also thik that the bug is fixed with the latest RFCOMM session disconnection fixes or that at least it is reduced in scope (so that only a process is killed and a TTY not released, without the disastrous memory corruption in the kernel). Either here we are discussing a completely different bug from that in the LKML thread http://news.gmane.org/find-root.php?message_id=%3c519480A1.6030909%40ahsoftware.de%3e or the kernel developers miss some relevant information.

Can someone from the ubuntu kernel team forward this info to Alexander Holler and others on the LKML list revitalizing the thread?

In the meantime, may I again suggest unconfiguring RFCOMM TTY support from ubuntu production kernels as a SRU until this is fixed, so DUN still remains unfunctional, but at least the kernel crashes that may lead to data loss are prevented? The thread on LKML does not exclude data loss '[crash reports] don't make much sense because they happen because of a disastreous memory corruption, which means the BUGs can include almost everything (hopefully nothing which eats your disk contents).'

Revision history for this message
eccerr0r (blc+launchpad) wrote :

Please look at Gianluca Anzolin's patches that were submitted Jul 12 and reviewed by Peter Hurley.

I have yet to test this but this looks promising.

Revision history for this message
eccerr0r (blc+launchpad) wrote :

Looking through LKML, Gianluca has some additional patches on Jul 22.
Looks like this is not done yet. I'll say it again, not sure where I said it in the past, this is one big ball of spaghetti that needs to be unwound and not broken further...

Revision history for this message
Sergio Callegari (callegar) wrote :

Isn't this just fixing a leak? Namely, that an object is not destroyed when it should so memory is wasted?
From previous messages on LKML the issue causing the crash seemed to be the opposite... that an object is destroyed and then its data is used again causing memory corruption...

Revision history for this message
eccerr0r (blc+launchpad) wrote :

I read it as destroying the tty structure before it was completely cleaned up, so future references to the tty struct were pointing to freed memory. Even better it looks like a new version of the patch set was released today Jul 26, in fact just now pretty much, subject "rfcomm: Implement rfcomm as a proper tty_port" where Gianluca writes:

"This patchset addresses an issue with the rfcomm tty driver in the
current stable kernels that manifests itself as a sudden lockup of the
whole machine or as a OOPS if we are lucky enough (I wasn't).

Triggering the problem is very easy:

1) establish a bluetooth connection with a bluetooth host
2) open the tty it provides with some program
3) turn off the bluetooth host or take it out of range"

which sounds very much like the problem we're seeing.

Revision history for this message
eccerr0r (blc+launchpad) wrote :

Oh I'm terribly sorry, I am incorrectly referring to LKML when I should have been writing Linux-Bluetooth mailing list. This has not made it to anything yet.

Revision history for this message
Sergio Callegari (callegar) wrote :

Thanks for the notice about the Linux-BT ML... I was looking at the latest posts by Gianluca Anzolin on LKML and that is why I was confused. This ones look really quite promising.

penalvch (penalvch)
tags: added: needs-kernel-logs needs-upstream-testing regression-potential
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: needs-crash-log
Revision history for this message
Sergio Callegari (callegar) wrote :

Ubuntu bug is buggy and hangs all the time with "You are already logged in You are already logged in as Sergio Callegari. If this is not you, please log out now." To make it work, you need to log-out and rerun apport-collect, not nice. Plus, I am now trying 3.10.3 and obviously apport is unhappy with it. Please save me the reboot.

Really, there should be no need of anything else. From previous messages: bug is acknowledged on the LKML and the linux-bluetooth ML and patches are already floating around. All kernels from 3.8 to 3.10.3 are affected. Bug is due to the introduction of the tty_port data structure for which rfcomm has never been updated. Thus, there is nothing that can be bisected and reverted to fix. And as already mentioned above, the crash log is meaningless, since there is a massive memory corruption.

penalvch (penalvch)
tags: added: kernel-bug-exists-upstream-v3.9-rc7
removed: kernel-bug-exists-upstream
Revision history for this message
deadprogram (ron-s) wrote :

This is definetly still problem, it impacts running 13.04 on a brand new Dell XPS 13 Developer Notebook.

Revision history for this message
penalvch (penalvch) wrote :

deadprogram, if you have a bug in Ubuntu, the Ubuntu Kernel team, Ubuntu Bug Control team, and Ubuntu Bug Squad would like you to please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see the Ubuntu Kernel team article:
https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports

the Ubuntu Bug Control team and Ubuntu Bug Squad team article:
https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue

and Ubuntu Community article:
https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Please note, not filing a new report would delay your problem being addressed as quickly as possible.

Thank you for your understanding.

Revision history for this message
Sergio Callegari (callegar) wrote :

Actually, there should be no need to ask for the ubuntu-bug report to users that are merely confirming a bug for which everything is already known and everyone is merely waiting for Gianluca Anzolin's patches to land on the stable update kernel channels.

BTW, ubuntu-bug is quite buggy itself. I guess this is why many people do not use it. See https://bugs.launchpad.net/ubuntu/+source/apport/+bug/642631 and https://bugs.launchpad.net/ubuntu/+source/apport/+bug/1200124. Having ubuntu-bug fixed would be the best incentive to push people to use it. No one likes using a tool that evidently has issues and at the same time asks you for the root password.

Revision history for this message
ekin (ekin) wrote :

I am having similar kernel panics while using a USB bluetooth dongle to connect my new desktop (Ubuntu 13.04, kernel 3.8.0-27-generic) to a bluetooth external speaker via A2DP. I went through the correspondence between Gianluca Anzolin et. al. in linux-bluetooth email list, and it seems like the patch review is still a work in progress.

Meanwhile a similar problem was reported in bug 1189998 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1189998). When I searched the phrase "bluetooth kernel panic" in launchpad, I saw many other bug reports on kernel panics related to bluetooth. Should I be looking into different bug reports for issues specifically related to A2DP, or is very likely that this bug, when patched, will solve A2DP related issues as well?

Revision history for this message
penalvch (penalvch) wrote :

ekin, if you have a bug in Ubuntu, the Ubuntu Kernel team, Ubuntu Bug Control team, and Ubuntu Bug Squad would like you to please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see the official Ubuntu documentation:
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices
Ubuntu Community: https://wiki.ubuntu.com/ReportingBugs

When opening up the new report, please feel free to subscribe me to it.

Please note, not filing a new report would delay your problem being addressed as quickly as possible.

Thank you for your understanding.

Revision history for this message
penalvch (penalvch) wrote :

Sergio Callegari, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11-rc5

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Revision history for this message
Jussi Saarinen (jussaar) wrote :
Revision history for this message
Sergio Callegari (callegar) wrote :

Apparently the patches do not include the cc tag for <email address hidden> in the sign-off area which assures that when applied to the stable tree they are applied also to the stable kernels without anything else needing to be done by the author or subsystem maintainer.

Should Gianluca Anzolin or Gustavo Padovan be kindly asked to assure that these patched are also forwarded to Greg KH for consideration in 3.10.8 and other post 3.8 stable series?

Revision history for this message
Jussi Saarinen (jussaar) wrote :

The patch series is apparently "too extensive to consider for -stable" [1]. So another solution is required for stable kernels. Gianluca's fix should eventually end up in mainline though (3.12 hopefully).

[1] http://marc.info/?l=linux-bluetooth&m=137762583515880&w=2

Revision history for this message
Sergio Callegari (callegar) wrote :

If the patch series is not applied, after 6 months of raring there will be 6 more months of saucy without the possibility to use the bluetooth tethering facility (bluetooth DUN) offered by mobile phones. Sounds bad.

In any case if this patch set is too extensive to apply, the obvious conclusion is that the not-too-extensive patch to apply is disabling rfcomm altogether in 3.8, 3.9, 3.10, 3.11. As is, rfcomm is totally broken, unusable and quite dangerous. It makes the kernel crash systematically at every use. Furthermore, the crash is due to kernel data being randomly rewritten with garbage, which means that the actual crash may happen *seconds to minutes after the actual issue provoking it* (seen > 40 sec personally). It is a bad situation that can easily lead to severe data loss in addition to down time.

Revision history for this message
Sergio Callegari (callegar) wrote :
Revision history for this message
Jussi Saarinen (jussaar) wrote :
Revision history for this message
Jussi Saarinen (jussaar) wrote :

Gianluca Anzolin writes on bluetooth-linux mailing list that though his tty refcount patch series is needed, more work is required to fix the problem. If I understood his mailing list message correctly, the system locks up when the device is released even after his patches have been applied.

Source:

http://marc.info/?l=linux-bluetooth&m=137788497602145&w=2

Revision history for this message
Jussi Saarinen (jussaar) wrote :

Gianluca Azolin's patches were merged to net-next day before yesterday. And yesterday they were merged to Linus' master branch. So patches will be in 3.12 rc1.

Revision history for this message
Jussi Saarinen (jussaar) wrote :
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
summary: - Kernel 3.8.x panics on bluetooth DUN disconnect
+ Kernels from 3.8.x to 3.11.x panic on bluetooth DUN disconnect
Revision history for this message
Sergio Callegari (callegar) wrote :

Fixed the bug title (so that the window of affected kernels is clear and it is clear that this is an issue both for raring and for saucy.

Also marked as confirmed, since the incomplete label was a bit weird at this point.

If I understand correctly the situation now is this

1) All kernels from 3.8.0 to 3.11.x are affected. 3.12 will most likely be fixed, unless something really unexpected happens with the 3.12 RC kernels and the fix eventually needs to be reverted.
2) The fix that is going into 3.12 RC1 would also apply cleanly to 3.11.x (and probably back to 3.8.x, possibly with very minor changes)
3) In spite of the fixing patches only touching rfcomm (that is anyway completely broken in 3.8.x to 3.11.x), they are considered too 'fat' for inclusion in the kernel stabilization series for 3.10.x and 3.11.x (I think that the kernel stabilization series for 3.9.x is terminated, so that 3.9 will not be fixed in anyway, and the kernel stabilization series for 3.8.x is in ubuntu's hands).
4) To provide a fix to 3.8.x to 3.11.x those who indicated the 3.12 fix as too fat for the stabilization series suggested a much 'thinner' approach based on BUG_ON. Unfortunately, when tried that approach proved not to fix the issue.

Now I wonder if the declaration of the proper fix as too fat for the stabilization series was in some way influenced by the wrong expectation to be able to count on a thinner fix.

At this point, either
- a proper 'thin' fix comes out soon suitable for the stabilization series and ubuntu imports it in raring and saucy's kernels
- ubuntu applies the proper 'fat' fix to its kernels even if it does not go to the stabilization series since in any case they only touch a subsystem that is currently utterly broken and to dangerous to use
- the bug remains unfixed in raring and saucy

In the last case, who is unaware may end up trying bluetooth DUN and experience a 'delayed' crash and data loss. The others will probably build a 3.8.x with the 'fat' fix on ppa and live happy.

penalvch (penalvch)
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: needs-apport-collect
removed: needs-kernel-logs
Revision history for this message
Sergio Callegari (callegar) wrote :

For the 10th time, tried. Hangs. Apport-collect is buggy and needs inconvenient workarounds to be used. Please push to have it fixed, otherwise it won't be used. If it worked fine, it would have been used it since the first time.

In any case I cannot see how a bug for which there are kernel patches accepted for next kernel and bug reports in all distros, plus 14 confirmations here, can be not 'confirmed' and bounce all time to 'incomplete'.

Too tired about this bug, can be closed for me, unsubscribed.

tags: added: kernel-bug-exists-upstream-v3.11.0
removed: kernel-bug-exists-upstream-v3.9-rc7 needs-apport-collect needs-bisect needs-crash-log needs-upstream-testing regression-potential
penalvch (penalvch)
tags: added: kernel-bug-exists-upstream-v3.11 needs-apport-collect needs-bisect needs-crash-log needs-upstream-testing regression-potential
removed: kernel-bug-exists-upstream-v3.11.0
tags: removed: needs-upstream-testing
Revision history for this message
Nix (nix-sasl) wrote :

Hey developers/bugzappers, same bug in vanilla kernel 3.10.12 on CentOS 6.4, is a kernel bug. I reported today to kernel.org

track and help me please.

https://bugzilla.kernel.org/show_bug.cgi?id=61431

Revision history for this message
penalvch (penalvch) wrote :

Nix, it would be best to report this to your distro -> http://bugs.centos.org/main_page.php

If you can reproduce this in Ubuntu, please file a new report via a terminal:
ubuntu-bug linux

Revision history for this message
eccerr0r (blc+launchpad) wrote :

Preliminary:
I just tried with 3.12-rc1 with blueman. It doesn't work - if someone else could also try this it would be interesting.
When I tried connecting to dialup networking, it claims it "cannot connect to networkmanager" after trying to setup the DUN connection for a few moments - implying there was a dbus issue. However the same setup, when I reboot back to 3.6.11, works fine - so userspace should be working.
It may be a kernel config error, but if someone else could also try it would be good. I did see /dev/rfcomm0 get set up, need to try to see if I can do AT commands to /dev/rfcomm0 at least, but I don't have a terminal program installed on this laptop...

Revision history for this message
eccerr0r (blc+launchpad) wrote :

More information:

Thanks to busybox being installed, I used its microcom utility and tested that the bluetooth RFCOMM link does work as I can submit the AT commands to the modem and the modem responded as expected. I can subsequently shut down the link too with blueman without seeing a hang/crash. In my opinion it seems to be working but there may be a separate unexpected BAPI change going on here.

I also want to make a correction here: the more exact error that I got was:
Connection Failed: Modem Manager did not support the connection.

Revision history for this message
penalvch (penalvch) wrote :

eccerr0r, if you have a bug in Ubuntu, the Ubuntu Kernel team, Ubuntu Bug Control team, and Ubuntu Bug Squad would like you to please file a new report by executing the following in a terminal while booted into a Ubuntu repository kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Please note, not filing a new report would delay your problem being addressed as quickly as possible.

No need exists to comment here at this time. After reading the above documentation in it's entirety, if you have further questions, you are welcome to redirect them to the appropriate mailing list or forum via http://www.ubuntu.com/support/community/mailinglists , or you may contact me directly.

Thank you for your understanding.

Revision history for this message
nitto (nitto) wrote :

My system is affected by the bug too.
Beacuse of the freeze I cannot post a bug-report, I can only attach this picture.
If I can dump something useful or post any other information please tell me how.
Regards
Nitto

Revision history for this message
penalvch (penalvch) wrote :

nitto, if you have a bug in Ubuntu, the Ubuntu Kernel team, Ubuntu Bug Control team, and Ubuntu Bug Squad would like you to please file a new report by executing the following in a terminal while booted into a Ubuntu repository kernel (not a mainline one) in a pre-3.8.x release via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Please note, not filing a new report would delay your problem being addressed as quickly as possible.

No need exists to comment here at this time. After reading the above documentation in it's entirety, if you have further questions, you are welcome to redirect them to the appropriate mailing list or forum via http://www.ubuntu.com/support/community/mailinglists , or you may contact me directly.

Thank you for your understanding.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
status: Confirmed → Incomplete
Revision history for this message
Georg Altmann (george-george-net) wrote :

I opened a new bug report for the issue: #1256811

Revision history for this message
Georg Altmann (george-george-net) wrote :

Confirmed since I can reproduce this on precise x64.
See bug #1256811

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream-v3.8 precise
removed: needs-apport-collect needs-bisect needs-crash-log
tags: added: kernel-bug
removed: regression-potential
penalvch (penalvch)
tags: added: needs-kernel-logs
removed: kernel-bug-exists-upstream-v3.8
tags: added: needs-crash-log
Revision history for this message
eccerr0r (blc+launchpad) wrote :

Anyone running into this bug on Ubuntu, please test Linux kernel 3.12-release or newer, if you can compile it (or if there is a package available...) This kernel should have the BT DUN fixes.

I really cannot say anything in these forums as I am not running Ubuntu (I have the same bug posted on Gentoo's bugtracker, https://bugs.gentoo.org/show_bug.cgi?id=474432 ) but there are not enough Gentoo users using BT DUN/NetworkManager for me to collaborate with... My observations is that with the 3.12 vanilla kernel the crash goes away if I manually set up an rfcomm link and detach it, but NetworkManager no longer finds BT DUN as a valid dialup device. Reverting back to 3.5.7 or 3.6.11 restores operation with the same userspace.

Revision history for this message
penalvch (penalvch) wrote :

eccerr0r, please do not solicit others to post comments here as "Me too!" comments wouldn't be helpful at this point. If you are using Gentoo, it would be best to engage the Gentoo developers for assistance. Despite this, so your hardware may be tracked, using Ubuntu, could you please file a new report by executing the following in a terminal while booted into a Ubuntu repository kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Thank you for your understanding.

tags: added: needs-upstream-testing
Revision history for this message
penalvch (penalvch) wrote :

Sergio Callegari, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc3

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Sergio Callegari (callegar) wrote :

There is no activity from me because I have unsubscribed from the bug... it is just by chance that I am reading this now.

This bug has been investigated and fixed in recent linux ages ago and Jussi, eccerr0r and I have provided all the relevant information and pointers in this thread. It has also been clarified in LKML that due to the type of bug it is quite meaningless to provide traces and that this bug may cause data loss due to corruption of the kernel data structures. Yet, dangerous as it is, and regardless of the many users confirmig the issue, the bug remains unconfirmed because there is no apport-collect that is considered much more important than all the other info. And since the bug is not confirmed, the ubuntu kernel developers are just ignoring the report and all the useful pointers to upstream discussion. And not surprisingly ubuntu remains unfixed months after the upstream fix. Similarly not surprisingly duplicate bug reports spring out without being recognized as such.

Unfortunately, I must say that I really find it a waste of time to report bugs and spend time in checking the LKML and provide pointers to the root relevant discussion there and pointers to upstream patches that fix the issue in this condition. And this is why I gave in. Also it is a bit sad that eccerr0r is invited not stay out of the discussion in the duplicate 1256811 just because his main OS is gentoo.

I am also really sorry to say that the management of this bug report seems to me as all rules and bureaucracy and zero substance, but I hope that this can be accepted as constructive criticism, since I'm also taking the time to try to explain why I'm getting this impression.

Revision history for this message
eccerr0r (blc+launchpad) wrote :

I as well am somewhat disappointed in the handling of this bug, but only for completeness, here I will report that a fix has been found but has not reached a stable release yet.

Gianluca has suggested two patches in the linux-bluetooth late last year patched against 3.12.6, and with these two patches I have found that behavior has been restored back to what it was like pre 3.8. This is very preliminary as there still may be latent issues but my inititial testing has indicated that the two patches have finally completely fixed the problem.

Revision history for this message
Jussi Saarinen (jussaar) wrote :

I managed to find one more bug report similar to this one. So now there are at least four bug reports (including this one) here in Launchpad on this problem:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1144322

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1165433

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1189998

https://bugs.launchpad.net/ubuntu/+source/linux-lts-raring/+bug/1256811

Anyway, if these are about the same bug that I think they are, the bug has now been completely fixed in kernel version 3.14, though kernel version 3.12 included some of the fix. Also there was one previous fix that is also needed that I think was backported to stable before 3.12.

I think I manged to list all the necessary commits, Here are links to the commits, in case they need to be backported:

The first fix (pre 3.12?):
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1d9e689c934bd5ecb0f273c6c65e0655c5cfee5f

The fixes in 3.12:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=396dc223dd36edd218650d042a07c5e61f022c5b

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ebe937f74b8a72cf3ceeae5c2194a160bb092901

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=54b926a1434e817ca84cb090f36b56763e192470

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=cad348a17e170451ea8688b532a6ca3e98c63b60

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ece3150dea382c7c961fe2604332ed3474960d25

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ffe6b68cc5999a3f91a15b6667e69e14186e337d

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=29cd718beba999bda4bdbbf59b5a4d25c07e1547

And finally the fixes in 3.14:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5b899241874dcc1a2b932a668731c80a3a869575

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e228b63390536f5b737056059a9a04ea016b1abf

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4a2fb3ecc7467c775b154813861f25a0ddc11aa0

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f86772af6a0f643d3e13eb3f4f9213ae0c333ee4

Revision history for this message
Sergio Callegari (callegar) wrote :

To Jussi...

as the original reporter of this bug, I think it has no hope of being fixed even if you provide all the relevant kernel commits.
Which is bad, because the crash is serious and happens 'late' when may kernel data structures may have been written over, with potential data loss.

I tried to provide pointers to all the relevant LKML entries and the only replies I got was that I should have run apport-collect, which was a bit frustrating. I hope that your list, that is much more organized than my posts, has a better success.

Besides that, the bad news is that even on trusty (that has kernel 3.14), bluetooth dun is broken. At least here the issue is mere lack of functionality, no crashes. This is due to:

- another kernel bug (it is probably http://lists.openwall.net/linux-kernel/2014/02/10/54). This prevents modemmanager from seeing rfcomm0 as a modem. Tried mainline 3.15.6 and works fine.

- a bug in modemmanager/networkmanager. This keeps rfcomm0 open when the connection is dropped and causes modemmanager to keep asking the bluetooth phone about the signal quality. When you try to connect again, the connection fails. This is easily worked around by "service modemmanager restart".

Probably a new bug should be opened on launchpad against the kernel, wrt the issue of parenting rfcomm0. This fix should be easy to backport. But I am a bit discouraged.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.