samba panics with sys_setgroups failed

Bug #1075670 reported by Daniel Lee
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
High
Unassigned

Bug Description

Running precise with the quantal kernel as details in this testcase http://packages.qa.ubuntu.com/qatracker/milestones/223/builds/25321/testcases results in panics in samba and core dumps. Trying to connect from windows will automatically cause the panic but sometimes connections from other ubuntu machines will also trigger it.

[2012/11/06 08:04:26.317198, 0] lib/util.c:1117(smb_panic)
  PANIC (pid 6406): sys_setgroups failed
[2012/11/06 08:04:26.401283, 0] lib/util.c:1221(log_stack_trace)
  BACKTRACE: 21 stack frames:
   #0 smbd(log_stack_trace+0x1a) [0x7f839688baea]
   #1 smbd(smb_panic+0x25) [0x7f839688bbc5]
   #2 smbd(+0x163be6) [0x7f83965d6be6]
   #3 smbd(set_sec_ctx+0x8f) [0x7f83965d6f0f]
   #4 smbd(+0x152db5) [0x7f83965c5db5]
   #5 smbd(+0x17ae2b) [0x7f83965ede2b]
   #6 smbd(+0x17b76e) [0x7f83965ee76e]
   #7 smbd(make_connection+0x1ea) [0x7f83965eeaea]
   #8 smbd(reply_tcon_and_X+0x1dc) [0x7f839659f82c]
   #9 smbd(+0x176fa4) [0x7f83965e9fa4]
   #10 smbd(+0x1773bb) [0x7f83965ea3bb]
   #11 smbd(+0x1777d3) [0x7f83965ea7d3]
   #12 smbd(run_events_poll+0x34e) [0x7f839689b8ae]
   #13 smbd(smbd_process+0x812) [0x7f83965ebf42]
   #14 smbd(+0x68666f) [0x7f8396af966f]
   #15 smbd(run_events_poll+0x34e) [0x7f839689b8ae]
   #16 smbd(+0x428a4a) [0x7f839689ba4a]
   #17 smbd(_tevent_loop_once+0x90) [0x7f839689c5d0]
   #18 smbd(main+0xed0) [0x7f839656a030]
   #19 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f839341576d]
   #20 smbd(+0xf7515) [0x7f839656a515]
[2012/11/06 08:04:26.406731, 0] lib/util.c:1122(smb_panic)
  smb_panic(): calling panic action [/usr/share/samba/panic-action 6406]
[2012/11/06 08:04:31.805828, 0] lib/util.c:1130(smb_panic)
  smb_panic(): action returned status 0
[2012/11/06 08:04:31.836695, 0] lib/fault.c:372(dump_core)
  dumping core in /var/log/samba/cores/smbd

Revision history for this message
Daniel Lee (longinus00) wrote :
Revision history for this message
Daniel Lee (longinus00) wrote :
Revision history for this message
Daniel Lee (longinus00) wrote :

I was able to find something similar on the web.

https://bugzilla.samba.org/show_bug.cgi?id=9310

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: quantal
Revision history for this message
Daniel Lee (longinus00) wrote :

Samba helpfully can mail a bt from gdb when it dumps its core.

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Daniel Lee (longinus00) wrote :

I have installed 3.5.0-18 from precise proposed and this problem is still reproducible.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.7 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc5-raring/

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Daniel Lee (longinus00) wrote :

For my future tests I rolled a VM to make sure everything was a clean slate.

Starting from a 12.04.1 server install I select the OpenSSH and Samba tasks. I tested to make sure that there were no issues connecting with a 3.2 kernel and that the 3.5 kernel made samba panic. I then tested samba with 3.7.0-030700rc5-generic and experienced the same panic (sys_setgroups failed). The connecting machine was in all instances a Windows XP 32 SP3 computer which was hosting the VM.

For my smb.conf I made it as minimal as possible.

--- smb.conf.orig
+++ smb.conf
@@ -289,6 +289,16 @@
 ; create mask = 0600
 ; directory mask = 0700

+ guest account = nobody
+
+[share]
+ comment = test
+ path = /home/
+ browseable = no
+ read only = yes
+ guest ok = yes
+ guest only = yes
+
 [printers]
    comment = All Printers
    browseable = no

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd also like to perform a bisect to figure out what commit caused this regression. It would be very helpful to know the earliest kernel where the issue started happening as well as the latest kernel that did not have this issue.

Can you test the following kernels and report back? We are looking for the first kernel version that doesn't have this bug:

v3.3 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3-precise/
v3.4 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-quantal/
v3.5-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5-rc4-quantal/

You don't have to test every kernel, just up until the kernel that first has this bug.

Thanks in advance!

tags: added: performing-bisect
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Daniel Lee (longinus00) wrote :

Before I test other kernel versions I would like to point out that there is no issue if running a straight quantal install. There is only an issue with running a newer kernel in a precise install.

Revision history for this message
Daniel Lee (longinus00) wrote :

I need to make a retraction from my statement in #8. I think I was mistaken about 3.7 not working and confused connection refusal for some other reason (perhaps a timeout from a previous connection, windows does not give very descriptive prompts) with the sys_setgroups failed issue because I wasn't diligent about checking timestamps in logs. Before I tested the 3 kernels you listed I first tested (again) that 3.2 was working and 3.5/3.7 did not. It seems that 3.7 does work as well as all three mainline kernels you put in #10.

I am reminded about how in the upstream bug I posted the poster mentions that the connection issues might not crop up right away but I am absolutely sure that every attempt I have made while the server was running 3.5.0-17/18 has always resulted in a sys_setgroups panic. The fact that it does not occur in any of the other kernels seems to suggest this might be due to some ubuntu carried patch?

Sorry for the confusion.

tags: removed: kernel-bug-exists-upstream
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the feedback.

Can you test the latest 13.04 development kernel and see if it exhibits this bug? It can be downloaded from:

https://launchpad.net/ubuntu/+source/linux/3.7.0-2.8/+build/3986205

Revision history for this message
Daniel Lee (longinus00) wrote :

Okay this case is getting weirder.

I installed 3.7.0-2.8 from the link and I also installed the 3.5.7 mainline build and both seemed fine. However, I was eventually able to boot into a 3.5.0-18 session that could actually connect without throwing samba panics. By chance I was able to discover that restarting smbd will (as far as I can tell) guarantee hitting the samba panic on the next connection for affected kernels. So if I boot into 3.5.0-18 and there is no panic right away I will hit one if I restart smbd (presumably if I wait long enough then the panics might show up anyway without needing a restart as is alluded to in the upstream bug report). Testing on 3.2.0 confirmed that it was immune to this issue because restarting samba did not cause all subsequent connections to cause a panic.

Testing all the kernels again using this new method it seems that kernels 3.3 and 3.4 are fine, but starting with 3.5.0rc4 I can trigger the bug so perhaps my 3.7 report from yesterday wasn't a mistake on my part after all. I tried the restart smbd trick in a 12.10 install and I'm still not able to cause a panic there. I can go try testing the earlier 3.5rcs but I think at this point it might be better to build newer versions of samba and see if they fix the issue.

Revision history for this message
Daniel Lee (longinus00) wrote :

So after spending some time debugging this issue and figuring it out I find that it's already been reported against quantal when quantal was not yet a 3.6.6. https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1016895 I can confirm that my issue is a duplicate of that one.

There is already a good writeup on the samba side of the issue in the bug report but I can explain why it only happens on kernels 3.5 and higher here. Looking at the setgroups syscall code in kerne/groups.c shows that a major rewrite landed in 3.5 as part of the user namespace patches. One of the consequences of this is that the groups_from_user function, which is called by setgroups, now explicitly tests against a gid of -1. This is done because this value is now overloaded as INVALID_GID for use in the user namespace implementation to indicate that a gid has no mapping to the kernel namespace.

Now that 3.5 is in precise and is planned to be made the default for the next point release I think it's important that the upstream cherrypick from bug 1016895 get included in samba 3.6.3 as soon as possible.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.