Samba daemon crashes when transfering large files to share in encrypted home dir

Bug #480849 reported by Christoph E
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
samba (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

Binary package hint: samba

1) 9.10 (Karmic)

2) Samba 2:3.4.0-3ubuntu5

3) I have set up an encrypted home directory which is automatically decrypted when logging in to the system. I have then set up a shared directory within my home dir. I expect to write to and read from this share from external clients when I am logged in at the server.

4) What actually happens is weird and not understood by me. Writing and reading most of the files works fine. Then I chose a file with a size of 9GB to write to the share. At first nothing happens and I can see that the hard disking is working at the server and that the CPU load of smbd rises very fast above 50%. After a few seconds the hdd stops working and the cpu load reaches 100%. The process cant be even killed in the usual way. The client (here a Windows 7 machine) gets an error message, which says that I dont have permissionen to write. (But other much smaller files in the same dir worked fine). But even with small files, the CPU load became extremely high (80%) when copying files from the client to the share.
Another problem happend when reading from the encrypted shared: It works most of the time, but in some case the client receives an error message that files could not be read.

Having an identical share on an unencrypted directory works fine in every case and the cpu load is just as usual below 20%.

I have to remark, that the cpu load also gets very high when copying or moving files between encrypted and unencrypted directories (no samba involved).

And last but not least, I had the same problem on the same machine with a previous RC install of Karmic a couple of weeks ago with a slightly different configuration of my partitions.

I am not sure what kind of other documentation is actually required to track this problem. My smb.conf is the default config with the two entries for the shares (see attachment). Encrypted share "Daten" does not work properly where as unencrypted share "Video" works fine.

ProblemType: Bug
Architecture: i386
Date: Wed Nov 11 19:16:08 2009
DistroRelease: Ubuntu 9.10
ExecutablePath: /usr/sbin/smbd
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release i386 (20091028.5)
NonfreeKernelModules: nvidia
Package: samba 2:3.4.0-3ubuntu5
ProcEnviron: PATH=(custom, no user)
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
SourcePackage: samba
Uname: Linux 2.6.31-14-generic i686

Revision history for this message
Christoph E (mail-christoph-evers) wrote :
Revision history for this message
Thierry Carrez (ttx) wrote :

You write that the "Samba daemon crashes" in your title, what do you mean by that ? One of the nmdd/smbd process segfaults ? Anything in the samba logs (/var/log/samba ?). If it's just that the file transfer appears to freeze and CPU use rises to 100%, then it's not really a "crash".

In the latter case, I wouldn't be surprised if load was up 100% when encrypting a 9Gb file and took what would appear forever. You mention copying smaller files with success. Could you please gradually try to increase file size and see what happens ?

Changed in samba (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Christoph E (mail-christoph-evers) wrote :

I am sorry for the misleading term "crashes" in the titel, but I didn't find any better expression while writing the bug report. samba does not crash/segfault, rather it raises the CPU load to 100%. Don't know how to call this.

Today I tested several file sizes and stopped at about 3500MB. It seems that there is no static border as a file with this size works in some case and it several others, it does not work and causes the above described problem. If a file is below this threshold, the probability is very high that the transmission will work.

I also raised the log level in samba to see what happens within samba and found the following:

[2009/11/12 21:00:24, 0] lib/util_sock.c:730(write_data)
[2009/11/12 21:00:24, 0] lib/util_sock.c:1468(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Connection reset by peer
[2009/11/12 21:00:24, 0] smbd/process.c:62(srv_send_smb)
  Error writing 62 bytes to client. -1. (Transport endpoint is not connected)
[2009/11/12 21:00:24, 0] lib/util_sock.c:730(write_data)
[2009/11/12 21:00:24, 0] lib/util_sock.c:1468(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2009/11/12 21:00:24, 0] smbd/process.c:62(srv_send_smb)
  Error writing 104 bytes to client. -1. (Transport endpoint is not connected)
[2009/11/12 21:00:24, 0] lib/util_sock.c:1468(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
[2009/11/12 21:00:24, 0] lib/util_sock.c:730(write_data)
[2009/11/12 21:00:24, 0] lib/util_sock.c:1468(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2009/11/12 21:00:24, 0] smbd/process.c:62(srv_send_smb)
  Error writing 60 bytes to client. -1. (Transport endpoint is not connected)
[2009/11/12 21:00:24, 0] lib/util_sock.c:730(write_data)
[2009/11/12 21:00:24, 0] lib/util_sock.c:1468(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2009/11/12 21:00:24, 0] smbd/process.c:62(srv_send_smb)
  Error writing 53 bytes to client. -1. (Transport endpoint is not connected)
[2009/11/12 21:00:24, 2] smbd/close.c:612(close_normal_file)
  christoph closed file large.001 (numopen=0) NT_STATUS_OK

Revision history for this message
Thierry Carrez (ttx) wrote :

Which process exactly ends up consuming 100% CPU ? smbd ? nmbd ? or something else ? When it fails at 3500 Mb, does letting it run for 30 minutes more result in success or failure ?

Revision history for this message
Christoph E (mail-christoph-evers) wrote :

its smbd which consumes 100% CPU. I can't run it for another 30Minutes as the the client immediately receives an error messagen saying "permission denied" and at the same time the samba's server harddisk stops working. The 100% smbd process then runs forever.

Chuck Short (zulcss)
Changed in samba (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Chuck Short (zulcss) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. The issue that you reported is one that should be reproducible with the live environment of the Desktop CD of the development release - Lucid Lynx. It would help us greatly if you could test with it so we can work on getting it fixed in the next release of Ubuntu. You can find out more about the development release at http://www.ubuntu.com/testing/ . Thanks again and we appreciate your help.

Changed in samba (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Christoph E (mail-christoph-evers) wrote :

So, today I had some time to test with the new Ubuntu Lucid Lynx Beta 1. Result: The problem is still persistent with the same effect. But today I also tested the copy process from another Ubuntu 9.10 to the 10.4 system. It behaves slightly different:

1) Copy process starts immediately
2) On the 10.4 samba system, a process called gvfsd-smb rises the CPU to 100% (on a core 2 duo @ 3,2GHz). smbd stays at 14%
3) The copy process works fine, but after I cancelled it, gvfsd-smb stays at 100%. But instead when copying from a windows machine (smbd@100%), I could kill gvfsd-smb easily.

Revision history for this message
Thierry Carrez (ttx) wrote :

That would point to an issue in libsmbclient, which is also used by gvfsd-smb to access shares.

Revision history for this message
gervin23 (gervin23) wrote :

Confirmed here with 30GB postgres backup from XP to Ubuntu 9.10 with encrypted home directory. Exact behaviour as described above (smb 100%, etc...)

Revision history for this message
Thierry Carrez (ttx) wrote :

Subscribing Dustin for reproduction/advice on the encrypted directory part of things...
Do you see any reason why libsmbclient-powered transfers of large files would fail on encrypted home directories ?

Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [Bug 480849] Re: Samba daemon crashes when transfering large files to share in encrypted home dir

Those should work, Thierry (under the rest of the normal assumptions
... file size after padding isn't bigger than the disk, file name
after padding isn't > 256 characters, etc.

Revision history for this message
Joshua Coombs (josh-coombs-gmail) wrote :
Download full text (25.2 KiB)

I've run up against this twice now.

Linux forgery 2.6.32-22-server #36-Ubuntu SMP Thu Jun 3 20:38:33 UTC 2010 x86_64 GNU/Linux

The box is running smbd with the only share being on a dmcrypted partition. I have a dozen Windows systems running everything from Win2k to 2k3 pushing nightly NTBackup dumps to the box. These dumps generate 9GB to 250GB files that are streamed to the box, and then read back by the host windows system to verify the dump. This works cleanly for about a week, when it fails one or more smbd processes will be pegging the cpu, along with a flush process. Any attempts to kill those processes hangs the shell, as does trying to do anything with the file system in question. Trying to reboot or shutdown -h also just hangs the current shell.

Jun 29 22:56:20 forgery in.rshd: Connection from <email address hidden> for backup
Jun 29 23:26:21 forgery in.rshd: Connection from <email address hidden> for backup
Jun 29 23:56:22 forgery in.rshd: Connection from <email address hidden> for backup
Jun 30 00:26:23 forgery in.rshd: Connection from <email address hidden> for backup
Jun 30 00:50:30 forgery kernel: [443134.639157] CPU 4
Jun 30 00:50:30 forgery kernel: [443134.641505] Modules linked in: nfsd exportfs nfs lockd nfs_acl
 auth_rpcgss cryptd aes_x86_64 aes_generic dm_crypt sunrpc lp parport usbhid hid fbcon tileblit fo
nt bitblit softcursor ahci vga16fb tg3 vgastate cciss
Jun 30 00:50:30 forgery kernel: [443134.663020] Pid: 11710, comm: smbd Not tainted 2.6.32-22-serve
r #36-Ubuntu ProLiant DL120 G6
Jun 30 00:50:30 forgery kernel: [443134.672521] RIP: 0010:[<ffffffff811dc694>] [<ffffffff811dc694
>] T.1113+0x1e4/0x1f0
Jun 30 00:50:30 forgery kernel: [443134.681167] RSP: 0000:ffff88009fd77908 EFLAGS: 00010297
Jun 30 00:50:30 forgery kernel: [443134.687180] RAX: 0000000000000035 RBX: ffff8800466bc920 RCX: 0
000000000000154
Jun 30 00:50:30 forgery kernel: [443134.695229] RDX: 0000000000000036 RSI: 0000000000000035 RDI: 0
000000000000153
Jun 30 00:50:30 forgery kernel: [443134.703278] RBP: ffff88009fd77958 R08: 0000000000000000 R09: 0
000000000000000
Jun 30 00:50:30 forgery kernel: [443134.711326] R10: 0000000010f0eda2 R11: 0000000000000000 R12: f
fff8800466bc880
Jun 30 00:50:30 forgery kernel: [443134.719374] R13: 0000000000000035 R14: 00000000012038b4 R15: f
fff8800466bcbe8
Jun 30 00:50:30 forgery kernel: [443134.727425] FS: 00007f3d87fd2720(0000) GS:ffff880005700000(00
00) knlGS:0000000000000000
Jun 30 00:50:30 forgery kernel: [443134.736539] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 30 00:50:30 forgery kernel: [443134.743029] CR2: 00007f3d87fe5000 CR3: 000000003788f000 CR4: 0
0000000000006e0
Jun 30 00:50:30 forgery kernel: [443134.751076] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0
000000000000000
Jun 30 00:50:30 forgery kernel: [443134.759123] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0
000000000000400
Jun 30 00:50:30 forgery kernel: [443134.767171] Process smbd (pid: 11710, threadinfo ffff88009fd76
000, task ffff880138582de0)
Jun 30 00:50:30 forgery kernel: [443134.778715] ffff880000000000 ffff880139e7c400 000000000000000
0 00000000012038b4
Jun 30 00:50:30 forgery kernel: [443134.786898] <0> ffffffff00000041 ffff8800...

Revision history for this message
Joshua Coombs (josh-coombs-gmail) wrote :

Just had the problem occur again, this time dmcrypt wasn't in use on the volume with the hosted share. dmcrypt is in use on the system though? Machine is hung badly enough that my serial console isn't functional so I can't grab any additional debug info from this instance. Looks like this may be #474089 instead?

Revision history for this message
MySQL Build Team (build-mysql) wrote : I can't believe you helped me save over $500 on this Bags

Hello Customer

Status accessories and attributes are very important for successful and popular people. Now you don't have to spend ridiculous money to impress partners with expensive watch. Purchase watches of high quality that look identical to the ones you will find at the jewelry store.
The best choice of the goods that will serve you for long time and will always be your favorite things.

---------------------------------------------------------------------------------
I got my watch couple days ago. I did like to thank to you first of all. The watch is fantastic and by the way thanks for the pen. It’s beautiful. I was very skeptical about buying things from the internet at the beginning but you changed my opinion. Now I know there are some serious business owners on the internet such as you. Great, reliable and on time service!
A thousand thanks
                     Cristina Goss
---------------------------------------------------------------------------------

Click here ---> http://luxwatchgift.ru

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.