Connection to ICE-unix/.. socket times out so programs take minutes to start

Bug #58171 reported by Nikolaus Rath
58
This bug affects 7 people
Affects Status Importance Assigned to Milestone
gnome-session (Ubuntu)
Confirmed
Medium
Unassigned
Declined for Gutsy by Henrik Nilsen Omma
Declined for Hardy by Steve Langasek
xorg (Ubuntu)
Invalid
Undecided
Unassigned
Declined for Gutsy by Henrik Nilsen Omma
Declined for Hardy by Steve Langasek

Bug Description

Sometimes X11 programs take ages to start, there is no CPU activity and the programs seem to wait for something. Same affected programs are gthumb, openoffice, totem, but e.g. firefox is not affected. The behaviour does not always occur, but if, then it happens for all programs.

Here's a strace from gthumb:

[0] nokile:~$ strace -o log gthumb

[ Nothing happens for about 5 minutes ]

[1]+ Stopped strace -o log gthumb
[148] nokile:~$ bg
[1]+ strace -o log gthumb &

[0] nokile:~$ tail log
chmod("/home/nikratio/.gnome2_private/", 0700) = 0
mkdir("/home/nikratio/.gnome2/accels", 0700) = -1 EEXIST (File exists)
socket(PF_FILE, SOCK_STREAM, 0) = 10
uname({sys="Linux", node="nokile", ...}) = 0
connect(10, {sa_family=AF_FILE, path="/tmp/.ICE-unix/5122"}, 21) = 0
fcntl64(10, F_SETFD, FD_CLOEXEC) = 0
write(10, "\0\1\0\0\0\0\0\0", 8) = 8
read(10, 0x8118548, 8) = ? ERESTARTSYS (To be restarted)
--- SIGCONT (Continued) @ 0 (0) ---

Revision history for this message
Nikolaus Rath (nikratio) wrote :

Version: 3:2.7.6-0ubuntu1

Revision history for this message
Daniel Holbach (dholbach) wrote :

Thanks for your bug report. Can you try to get a backtrace?

$ gdb thumb
...
(gdb) run
...
Ctrl-C
(gdb) thread apply all bt

Changed in gthumb:
assignee: nobody → desktop-bugs
importance: Untriaged → Medium
status: Unconfirmed → Needs Info
Revision history for this message
Daniel Holbach (dholbach) wrote :

You could also try if it works with a new user.

Revision history for this message
Nikolaus Rath (nikratio) wrote :

It is a sporadic error, so I have to wait until it happens again. (but if it happens, I can reproduce it again and again, I don't know what causes this behaviour to stop and to start).

Revision history for this message
Nikolaus Rath (nikratio) wrote :

(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)

Program received signal SIGINT, Interrupt.
[Switching to Thread -1225877824 (LWP 16951)]
0xffffe410 in __kernel_vsyscall ()
(gdb) thread apply all bt

Thread 1 (Thread -1225877824 (LWP 16951)):
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7302543 in __read_nocancel () from /lib/tls/i686/cmov/libpthread.so.0
#2 0xb7eaed8f in _IceTransFreeConnInfo () from /usr/lib/libICE.so.6
#3 0xb7eaf1ad in _IceTransRead () from /usr/lib/libICE.so.6
#4 0xb7eb35ff in _IceRead () from /usr/lib/libICE.so.6
#5 0xb7eb3bc9 in IceProcessMessages () from /usr/lib/libICE.so.6
#6 0xb7eabc7e in IceOpenConnection () from /usr/lib/libICE.so.6
#7 0xb7ec1152 in SmcOpenConnection () from /usr/lib/libSM.so.6
#8 0xb7eee431 in gnome_client_connect () from /usr/lib/libgnomeui-2.so.0
#9 0xb7eee5aa in gnome_client_new () from /usr/lib/libgnomeui-2.so.0
#10 0xb7e3a7fb in gnome_program_postinit () from /usr/lib/libgnome-2.so.0
#11 0xb7e3abd0 in gnome_program_postinit () from /usr/lib/libgnome-2.so.0
#12 0xb7e3aed9 in gnome_program_init () from /usr/lib/libgnome-2.so.0
#13 0x0809fa65 in main ()
(gdb)

Nikolaus Rath (nikratio)
description: updated
Changed in gthumb:
status: Needs Info → Unconfirmed
Changed in xorg:
assignee: desktop-bugs → ubuntu-x-swat
Revision history for this message
Nikolaus Rath (nikratio) wrote :

Any news on this bug? It is still occuring regularly...

Revision history for this message
Olivier Tilloy (osomon) wrote :

The same bug sometimes happens to me at unpredictable moments.
When it does, most of the programs I try to launch hang in the same way.
Running a strace on them show exactly the same problem ; reading from a socket times out after ages:

olivier@sanctuary:~$ strace gnome-terminal

execve("/usr/bin/gnome-terminal", ["gnome-terminal"], [/* 34 vars */]) = 0

[...]

connect(10, {sa_family=AF_FILE, path="/tmp/.ICE-unix/9600"}, 21) = 0
fcntl64(10, F_SETFD, FD_CLOEXEC) = 0
write(10, "\0\1\0\0\0\0\0\0", 8) = 8
read(10,

...hanged.

Revision history for this message
Florian Boucault (fboucault) wrote :

Confirming according to oSoMoN's comment.

Changed in xorg:
status: Unconfirmed → Confirmed
Revision history for this message
everamzah (everamzah) wrote :

i've gotten the same problems. it _is_ sporadic. when it happens again i will try to do my first gdb backtrace.

it all started when i made a new user account.

i use xdm

Revision history for this message
Derek (bugs-m8y) wrote :

Just ran into same issue with gvim, but fairly repeatedly.
connect(4, {sa_family=AF_FILE, path="/tmp/.ICE-unix/4929"}, 21) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
write(4, "\0\1\0\0\0\0\0\0", 8) = 8
read(4,

strace of gvim
gvim just hangs out there doing nothing.
Kind of annoying, fortunately not noted with anything else yet, although this *is* a fresh feisty install.

Revision history for this message
Derek (bugs-m8y) wrote :

Hm. My issue, while similar symptomatically, seems related to RapidSVN.
When executing gvim from commandline, is fairly responsive.
When opening gvim from RapidSVN and doing a diff, hangs about half the time (RapidSVN is also very slow and doing a lot of read/writing to file handle in the Log dialog).
While rapidsvn has gvim spawned, any launch of gvim from console hangs on above ICE-unix thing.
After running a killall gvim, running gvim from console functions again, including commands similar to the one rapidsvn is running, such as:
gvim -f -d /tmp/file1-347.5.java /tmp/file1-2.5.java

Revision history for this message
Derek (bugs-m8y) wrote :

Nice. While these hanging-in-background gvim processes are doing god-knows-what I can't launch gnome-terminal either.
I had to switch to a console to do a killall gvim before the two gnome-terminal instances hanging on launch in read of ICE-unix would respond.
Kinda puzzled just what libICE does that would require this behaviour.

Revision history for this message
Derek (bugs-m8y) wrote :

Solution personally is to uninstall RapidSVN.
eSVN works well enough.
Hopefully comments might give a clue as to what is going on for others though.

Revision history for this message
Derek (bugs-m8y) wrote :

Addendum for those who might run into this with RapidSVN - eSVN needs 0.6.12 to function with new SVN changes - inconveniently, Debian and Ubuntu have yet to make a single .deb of eSVN, even though the release is now 3 months ago.
Requires building by hand at present and leaving outside of source control.
That or putting up with no file info in the gui.

Revision history for this message
Giuliano Procida (giuliano-procida) wrote :

I've noticed very similar behaviour. It is instructive to strace -r -p $pid your running x-session-manager that is on the other end of the socket. In my case it shows it reading resolv.conf and asking my ADSL router (as DNS server) about my machine name. This times out after 5 seconds. tcpdump shows at least two AAAA requests for the name coming back NXDomain.

My nsswitch.conf has:

hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

My /etc/resolv.conf is (modulo blank lines):

# generated by NetworkManager, do not edit!
nameserver 192.168.150.1

The Right Thing may just be to add an alias in /etc/hosts so that the machine name resolves to 127.0.0.1 (it has the advantage of eliminating the delays), but I doubt it. My Debian machine does not suffer this issue (but then, I don't use Gnome there).

So just what is gnome-session doing gethostbyname if it does not apparently use the result (strace shows no new socket being opened after the lookup)?

Giuliano.

Revision history for this message
Giuliano Procida (giuliano-procida) wrote :

I should have read the bug more carefully. The original issue is a hang, not an extra 5 second delay starting most applications.

Giuliano.

Revision history for this message
Alex Nehaichik (nab) wrote :

I have same problem. Here is my strace for yelp. Socket /tmp/.ICE-unix/5214 was created by gnome-session.

Revision history for this message
Peter Clifton (pcjc2) wrote :

I'm changing the package to gnome-session, as I believe that is more directly responsible than Xorg here.

Revision history for this message
Alex Nehaichik (nab) wrote :

After disabling ipv6 the problem is gone. The command for disable:

sudo sed -i 's/alias net-pf-10 ipv6/alias net-pf-10 off/g' /etc/modprobe.d/aliases

Revision history for this message
Peter Clifton (pcjc2) wrote :

Ah... I think I recall seeing an update in Gutsy proposed which disabled AAAA lookups on IPV6 for certain configs, so perhaps this helps things?

https://edge.launchpad.net/ubuntu/+source/glibc

glibc (2.6.1-1ubuntu10) gutsy-proposed; urgency=low

  * Reapply any/local-ipv6-sanity.diff, lost when merging 2.6. LP: #156720.
    Only do AAAA lookups if we have an interface with better than link-local
    addresses available.

 -- Matthias Klose <email address hidden> Wed, 24 Oct 2007 18:46:56 +0200

Revision history for this message
Benjamin Redelings (benjamin-redelings) wrote :

This bug is still here. I have an AMD64 system running current hardy, and I have been observing this bug intermittently for a while. Like the other posters, I see a hang on /tmp/.ICE-unix/* when this happens. So, the IPV6 changes to glibc did not fix it. (2.6.1--6ubuntu2)

This bug tends to happen when I am connected to a low-signal strength wireless access point. If the connection stops working, then programs such as xterm, gnome-terminal, evince, etc all stall for a very long time before starting.

I am surprised that this bug has survived so long - it is pretty severe when it occurs!

Is there anything I can do that will help? Backtraces of gnome-session?

-BenRI

Revision history for this message
Jan Sebesta (sebestajan) wrote :

Does anybody found correct solution?

I have Ubuntu 7.04 Feisty (kernel 2.6.20-16-generic and gnome desktop) and have exactly the same problem. When I unplug network cable, then any application execution is blocked for about 5 or 10 seconds. strace gives the same output as it has been posted by another guys here.

I tried to switch off ipv6 as it has been posted by Alexander, but without success.

I found in internet, that /tmp/.ICE-unix folder contains X11 session information. So it seems for me, that the problem is in x11 daemon, which executes applications running on it.

I am not expert on GUI in Linux, but I can always easily reproduce the error and I can try to debug or whatever else, just tell me how.. Or maybe is it fixed in some new update?

Jan

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I recently started seeing this behavior on up to date hardy. strace of various application (in this case, xterm) shows:
connect(4, {sa_family=AF_FILE, path="/tmp/.ICE-unix/9709"}, 21) = 0
 fcntl(4, F_SETFD, FD_CLOEXEC) = 0
write(4, "\0\1\0\0\0\0\0\0", 8) = 8
read(4,

Then hangs forever. If I strace x-session-manager (in this case, gnome-session), I see:

$ strace -p 9709
Process 9709 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>

Tried launching stuff, but nothing. The I tried strace with '-f', and got this extra information:
[pid 9790] read(24,

and it hangs there. I don't have a pig 9790 (a thread maybe?). I do also have /usr/bin/seahorse-agent --execute x-session-manager, but this has been enabled for sometime, and I only within the last couple of weeks started to have this problem.

This is somewhat disturbing, as I wasn't actively fiddling with anything-- just 'working'. At the time of the hangs, I have in my logs:
Mar 21 10:41:43 severus dnsmasq[6127]: DHCPREQUEST(vnet0) 192.168.122.155 52:54:00:6b:39:72
Mar 21 10:41:43 severus dnsmasq[6127]: DHCPACK(vnet0) 192.168.122.155 52:54:00:6b:39:72 hardy-amd64-bind9
Mar 21 10:42:29 severus avahi-daemon[6009]: Withdrawing address record for fe80::2ff:c1ff:fe00:f992 on vnet1.

I *do* have ipv6 enabled. I have seen other bugs with avahi, and wonder if it is avahi related (eg bug #193432, but there are others).

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I didn't add all the entries:
Mar 21 10:41:43 severus dnsmasq[6127]: DHCPREQUEST(vnet0) 192.168.122.155 52:54:00:6b:39:72
Mar 21 10:41:43 severus dnsmasq[6127]: DHCPACK(vnet0) 192.168.122.155 52:54:00:6b:39:72 hardy-amd64-bind9
Mar 21 10:42:29 severus kernel: [130310.629626] vnet0: port 1(vnet1) entering disabled state
Mar 21 10:42:29 severus avahi-daemon[6009]: Withdrawing address record for fe80::2ff:c1ff:fe00:f992 on vnet1.
Mar 21 10:42:29 severus kernel: [130310.642074] device vnet1 left promiscuous mode
Mar 21 10:42:29 severus kernel: [130310.642086] audit(1206110549.802:36): dev=vnet1 prom=0 old_prom=256 auid=4294967295
Mar 21 10:42:29 severus kernel: [130310.642091] vnet0: port 1(vnet1) entering disabled state

Revision history for this message
Bryce Harrington (bryce) wrote :
Download full text (5.6 KiB)

Hmm, not sure what's going on. It might help to run xterm with its TRACE messages flipped on. Here's a package with this done:

  http://people.ubuntu.com/~bryce/Testing/ICE-unix/

I also encourage everyone experiencing this issue to install debug packages of gnome-session, libice, libgnomeui, and libgnome as a minimum.

The fact that it's hanging right after establishing a connection via .ICE-unix seems like the strongest clue we have so far. On my system (which is working properly), here is what has this open, for your comparison:

bryce@chideok:~/src/xorg-server/xorg-server-1.4.1~git20080131-patched$ lsof | grep ICE
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
x-session 6021 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
x-session 6021 bryce 22u unix 0xdf900540 14732 /tmp/.ICE-unix/6021
x-session 6021 bryce 27u unix 0xf7317700 15228 /tmp/.ICE-unix/6021
x-session 6021 bryce 28u unix 0xf0e2e700 15324 /tmp/.ICE-unix/6021
x-session 6021 bryce 29u unix 0xf0e3d000 15336 /tmp/.ICE-unix/6021
x-session 6021 bryce 30u unix 0xf0e67540 15949 /tmp/.ICE-unix/6021
x-session 6021 bryce 31u unix 0xf0deea80 15950 /tmp/.ICE-unix/6021
x-session 6021 bryce 32u unix 0xf0ddc000 16164 /tmp/.ICE-unix/6021
x-session 6021 bryce 33u unix 0xf0f6b1c0 16656 /tmp/.ICE-unix/6021
x-session 6021 bryce 34u unix 0xf0fe0700 17015 /tmp/.ICE-unix/6021
seahorse- 6100 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
gnome-set 6109 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
gnome-scr 6139 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
metacity 6144 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
nautilus 6147 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
update-no 6158 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
evolution 6167 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
gnome-vol 6174 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
gnome-pow 6177 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
trashappl 6206 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
mixer_app 6242 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
deskbar-a 6245 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
fast-user 6248 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
notificat 6284 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0

The straces unfortunately are far too low level to give much insight into where the failure is occurring, other than indicating it's happening during an ICE network negotiation. What would really be useful in...

Read more...

Revision history for this message
Nikolaus Rath (nikratio) wrote :

I haven't experienced the problem in the last 6 months, so it seems to be fixed.

Revision history for this message
Bryce Harrington (bryce) wrote :

Closing as per comment #26

Changed in gnome-session:
status: Confirmed → Fix Released
Revision history for this message
Tero Karvinen (karvinen+launchpad) wrote :

This problem occurs to me on a freshly installed, updated Hardy on amd64. When it's happening, OpenOffice does not start and vim opens very slowly. Thus, it makes a desktop nearly unusable.

Changed in gnome-session:
status: Fix Released → Confirmed
Revision history for this message
Tero Karvinen (karvinen+launchpad) wrote :

When this problem is happening:
- OpenOffice does not start
- vim is starts slowly
- It takes 7 seconds to open an xterm window

The socket in /tmp/.ICE-unix/ has a different number than the one programs are looking for.

I time over 7 s when I start xterm and close the window as soon as it appears:
$ time xterm
Warning: Tried to connect to session manager, Could not open network socket
real 0m7.374s
...

Vim starts slowly, strace mentions ICE-unix
$ strace vim newfile.txt
...
nanosleep({1, 0}, {1, 0}) = 0
socket(PF_FILE, SOCK_STREAM, 0) = 3
uname({sys="Linux", node="foobar", ...}) = 0
connect(3, {sa_family=AF_FILE, path="/tmp/.ICE-unix/6026"}, 21) = -1 ENOENT (No
such file or directory)
close(3)

Indeed, this /tmp/.ICE-unix/6026 does not exist. Instead, a similarly named socket with a different number is found:
$ ls -ld /tmp/.ICE-unix/ /tmp/.ICE-unix/*
drwxrwxrwt 2 root root 4096 2008-08-11 12:33 /tmp/.ICE-unix/
srwxrwxrwx 1 tero tero 0 2008-08-11 12:33 /tmp/.ICE-unix/6943

Don't know if this is related, but localhost and hostname are mapped to different IP's in the loopback block.
$ cat /etc/hosts
...
127.0.0.1 localhost
127.0.1.1 foobar
...

Running Ubuntu 8.04.1 hardy. IPv6 is not disabled. Currently, DNS resolution is quite slow, don't know if it is related. I have network-manager 0.6.6-0ubuntu5, xserver-xorg 1:7.3+10ubuntu10.2, xterm 229-1ubuntu1, vim 1:7.1-138+1ubuntu3.

Revision history for this message
Tero Karvinen (karvinen+launchpad) wrote :

Problem can be bypassed by creating a symlink.

From strace I can see that programs are looking for socket "/tmp/.ICE-unix/6026". With 'ls', I can see that instead there is a socket "/tmp/.ICE-unix/6943". So I can bypass the problem by creating a symlink:

$ cd /tmp/.ICE-unix/
$ ln -s 6943 6026
$ ls -l
total 0
lrwxrwxrwx 1 tee tee 4 2008-08-14 12:06 6026 -> 6943
srwxrwxrwx 1 tee tee 0 2008-08-11 12:33 6943

After that, xterm opens immediately.

$ xterm
Warning: Tried to connect to session manager, Authentication Rejected, reason : MIT-MAGIC-COOKIE-1 authentication rejected

Vim also opens without delay. However, OpenOffice still does not work.

Revision history for this message
Sebastien Bacher (seb128) wrote :

how do you start your session? do you have the issue every time? could you describe how to trigger it?

Changed in gnome-session:
assignee: ubuntu-x-swat → nobody
status: Confirmed → Incomplete
Revision history for this message
Tero Karvinen (karvinen+launchpad) wrote :

Session is started in the default way: I have hardy amd64 installed from live cd. I start my session by logging in from gdm's graphical login screen.

I tried logging out and restarting X with alt-ctrl-backspace, but it did not fix the problem. I just installed this machine, so I don't know if it happens a lot.

I have not yet developed a guaranteed way to trigger this problem or a guaranteed way to get rid of it. Earlier commenters say it happens unpredictably.

Changed in gnome-session:
status: Incomplete → Confirmed
Revision history for this message
Tero Karvinen (karvinen+launchpad) wrote :

This is how OpenOffice.org does not start:

$ strace oowriter
...
rt_sigaction(SIGTERM, {SIG_DFL}, NULL, 8) = 0
read(10, "#!/bin/sh\n/usr/lib/openoffice/pr"..., 8192) = 59
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f3fd7387770) = 9590
wait4(4294967295, javaldx: Could not find a Java Runtime Environment!
0x7fffdf3a301c, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
wait4(4294967295,

In the end, it seems to wait forever (no change in 10 min). Strace output really ends with a partial line "wait4(4294967295,". Error message "javaldx: Could not find a Java Runtime Environment!" is printed also without strace. Top shows that OOo does not take very much memory or CPU time. I am not sure if this is the same ICE-unix bug or a different one.

Revision history for this message
Tero Karvinen (karvinen+launchpad) wrote :

I had the same problem on a different machine, an amd64 Gutsy 7.10.

Could be bypassed like above, by symlinking the .ICE-unix/NUMBER from strace to .ICE-unix/CORRECTNUMBER.

Symptoms included slow evince start, can't open menu in OpenOffice, OpenOffice crashing.

The machine had been running long time with the same X session. Then I logged out, changed language and logged back in. Problem probably started after logging back in.

Revision history for this message
Per (per-mathisen) wrote :

Could this be a problem with resolving the local hostname? Setting the local hostname back to localhost.localdomain in Network Manager, which incidentially matches the hostname mentioned in /etc/hosts, fixed the problem for me when I encountered it.

Revision history for this message
Smeuuh (smeuuh-free) wrote :

Hi,
I've got a similar bug in Intrepid (since release : didn't affect me in alphas). After running my computer for some time, x-session-manager just loops (full CPU usage), and applications take minutes to start. Same strace symptoms. If that matters, my /etc/hosts looks like this (my hostname is john) :
127.0.0.1 john localhost.localdomain localhost
127.0.0.1 john

Revision history for this message
Smeuuh (smeuuh-free) wrote :

Hi,
happened to me again. My .xsession-errors gets flooded with messages saying it doesn't connect to ICE. Here's a sane strace emacs :

connect(4, {sa_family=AF_FILE, path="/tmp/.ICE-unix/6212"}, 21) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
write(4, "\0\1\0\0\0\0\0\0", 8) = 8
read(4, "\0\1\0\267\0\0\0\0", 8) = 8

When buggy, it becomes :
connect(4, {sa_family=AF_FILE, path="/tmp/.ICE-unix/6241"}, 21) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
write(4, "\0\1\0\0\0\0\0\0", 8) = 8
read(4, <unfinished ...>

The bug seems to differ of the one other commenters have in that /tmp/.ICE-unix/6241 does exist.

Revision history for this message
borjasotomayor (borjasotomayor) wrote :

I am observing the same problem on a 64bit Intrepid (this problem didn't show up with beta; it started when I updated to the final release of Interpid). Trying to start any X program (gedit, firefox, etc.) from the terminal hangs. strace reveals the following:

socket(PF_FILE, SOCK_STREAM, 0) = 6
uname({sys="Linux", node="castafiore", ...}) = 0
connect(6, {sa_family=AF_FILE, path="/tmp/.ICE-unix/8605"}, 21) = 0
fcntl(6, F_SETFD, FD_CLOEXEC) = 0
write(6, "\0\1\0\0\0\0\0\0", 8) = 8
read(6, <hangs... >

/tmp/.ICE-unix/8605 does exist, and process 8605 is x-session manager. When doing strace -r into x-session manager, I observe this message over and over again:

     0.000027 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=10, events=POLLIN}, {fd=14, events=POLLIN|POLLPRI}, {fd=16, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN|POLLPRI}, {fd=18, events=POLLIN|POLLPRI}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}, {fd=24, events=POLLIN}, {fd=26, events=POLLIN}, {fd=28, events=POLLIN}, {fd=27, events=POLLIN}, {fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}, {fd=35, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN}], 19, -1) = 1
     0.000057 accept(9, 0x7fffba932f20, [4294967406]) = -1 EMFILE (Too many open files)
     0.000027 write(2, "_IceTrans", 9) = 9
     0.000031 write(2, "SocketUNIXAccept: accept() faile"..., 34) = 34
     0.000036 read(3, 0x119a6e4, 4096) = -1 EAGAIN (Resource temporarily unavailable)

Number of open files:

$ lsof -n | grep /tmp/.ICE-unix/8605 | wc --lines
1003

$ lsof -n | grep "x-session 8605" | wc --lines
1100

~$ lsof -n | grep "x-session 8605" | tail
x-session 8605 borja 1014u unix 0xffff880013808580 2934726457 /tmp/.ICE-unix/8605
x-session 8605 borja 1015u unix 0xffff8800138f9340 2934726532 /tmp/.ICE-unix/8605
x-session 8605 borja 1016u unix 0xffff8800138f9e40 2934726626 /tmp/.ICE-unix/8605
x-session 8605 borja 1017u unix 0xffff8800138fa100 2934726701 /tmp/.ICE-unix/8605
x-session 8605 borja 1018u unix 0xffff8800138fa680 2934726801 /tmp/.ICE-unix/8605
x-session 8605 borja 1019u unix 0xffff8800138fac00 2934726876 /tmp/.ICE-unix/8605
x-session 8605 borja 1020u unix 0xffff8800138fb180 2934726968 /tmp/.ICE-unix/8605
x-session 8605 borja 1021u unix 0xffff8800138f98c0 2934727043 /tmp/.ICE-unix/8605
x-session 8605 borja 1022u unix 0xffff880013d34840 2934727135 /tmp/.ICE-unix/8605
x-session 8605 borja 1023u unix 0xffff8800138f8000 2934727210 /tmp/.ICE-unix/8605

The only solution I have been able to find is to restart the X server. After doing this, it will work ok for a while, and after a couple of hours I will be unable to start any new X programs.

Revision history for this message
Smeuuh (smeuuh-free) wrote :

As a temporary workaround I have set the limit of open files to a very large value.
The problem is that gnome-session, for some reason, doesn't close the socket when it should, which raises an error when the max limit of open files is reached. I have no idea where this comes from, but since we know approximately the time at which the bug was introduced in intrepid, it should be easy for someone knowledgeable to locate the faulty patch.

Revision history for this message
Tero Karvinen (karvinen+launchpad) wrote :

Smeuuh: It's not just 8.10 Intrepid, it also happens in 8.04 Hardy. Or did you mean that it was fixed in Intrepid alpha, then broken on Intrepid release?

Thanks for the "open files limit" bug bypass and diagnoses.

Revision history for this message
Smeuuh (smeuuh-free) wrote :

It seems borjasotomayor and I experienced the same bug : everything was fine with alphas and betas until an update around intrepid release which broke it.
It also seems symptoms are not the same in our case than in yours : your problem was that the socket was misnamed for some reason, ours that gnome-session doesn't close files. If it happens again, could you try the manipulation borjasotomayor did and check the number of open files ? If it's around 1024 then it's probably the same bug as ours

Revision history for this message
Tero Karvinen (karvinen+launchpad) wrote :

Yes, I'll be happy to try out "open files limit" bug bypass as soon as I see the symptoms again.

Revision history for this message
Bryce Harrington (bryce) wrote :

[This is an automated message]

Hi nikratio,

Thanks for including the attached files. Could you also include your /var/log/Xorg.0.log?

Changed in xorg:
status: New → Incomplete
Revision history for this message
Bryce Harrington (bryce) wrote :

We're closing this bug since it is has been some time with no response from the original reporter. However, if the issue still exists please feel free to reopen with the requested information. Also, if you could, please test against the latest development version of Ubuntu, since this confirms the bug is one we may be able to pass upstream for help.

Changed in xorg:
status: Incomplete → Invalid
Revision history for this message
Eric Julien (eric-julien) wrote :

I have the same problem using Intrepid on x86_64 (fully updated as of today)

Trying to start nautilus, I get this strace
...
09:21:58 gettimeofday({1233930118, 706768}, NULL) = 0
09:21:58 gettimeofday({1233930118, 706857}, NULL) = 0
09:21:58 gettimeofday({1233930118, 706907}, NULL) = 0
09:21:58 socket(PF_FILE, SOCK_STREAM, 0) = 10
09:21:58 uname({sys="Linux", node="jule2401-desktop", ...}) = 0
09:21:58 connect(10, {sa_family=AF_FILE, path="/tmp/.ICE-unix/6735"}, 21) = 0
09:21:58 fcntl64(10, F_SETFD, FD_CLOEXEC) = 0
09:21:58 write(10, "\0\1\0\0\0\0\0\0", 8) = 8
09:21:58 read(10,

Stracing x-session-manager gives
     0.000114 accept(9, 0xbfc4bf1a, [110]) = -1 EMFILE (Too many open files)
     0.000028 write(2, "_IceTrans", 9) = 9
     0.000029 write(2, "SocketUNIXAccept: accept() faile"..., 34) = 34
     0.000039 read(3, 0x8309b6c, 4096) = -1 EAGAIN (Resource temporarily unavailable)
     0.000030 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=10, events=POLLIN}, {fd=14, events=POLLIN|POLLPRI}, {fd=16, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN|POLLPRI}, {fd=18, events=POLLIN|POLLPRI}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}, {fd=24, events=POLLIN}, {fd=26, events=POLLIN}, {fd=28, events=POLLIN}, {fd=27, events=POLLIN}, {fd=30, events=POLLIN}, {fd=29, events=POLLIN}, {fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}, {fd=118, events=POLLIN}, {fd=120, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN}], 22, -1) = 1
     0.000113 accept(9, 0xbfc4bf1a, [110]) = -1 EMFILE (Too many open files)
     0.000029 write(2, "_IceTrans", 9) = 9
     0.000028 write(2, "SocketUNIXAccept: accept() faile"..., 34) = 34
     0.000040 read(3, 0x8309b6c, 4096) = -1 EAGAIN (Resource temporarily unavailable)
     0.000029 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=10, events=POLLIN}, {fd=14, events=POLLIN|POLLPRI}, {fd=16, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN|POLLPRI}, {fd=18, events=POLLIN|POLLPRI}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}, {fd=24, events=POLLIN}, {fd=26, events=POLLIN}, {fd=28, events=POLLIN}, {fd=27, events=POLLIN}, {fd=30, events=POLLIN}, {fd=29, events=POLLIN}, {fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}, {fd=118, events=POLLIN}, {fd=120, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN}], 22, -1) = 1

 lsof | wc -l
8087

I attached output of LSOF

Revision history for this message
Eric Julien (eric-julien) wrote :

Pressed save too fast...

I must add I get the problem since everyday, about 30 mins or so after starting up, since about a week or so.

Thanks!

Revision history for this message
cantona (paradyse) wrote :

any workaround?
increase "open files limit" doesn't help!

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 8189
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 900000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 8189
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

connect(5, {sa_family=AF_FILE, path="/tmp/.ICE-unix/16208"}, 22) = 0
fcntl64(5, F_SETFD, FD_CLOEXEC) = 0
write(5, "\0\1\0\0\0\0\0\0", 8) = 8
read(5,

# lsof -n | grep /tmp/.ICE-unix/16208 |wc -l
1003

Revision history for this message
cantona (paradyse) wrote :

Ubuntu 8.10

Revision history for this message
PoltoS/ (poltorak) wrote :

The same problem for me since mai 2009.

For me it seems to be related with wifi manipulations: disconnect from AP, reconnect to AP, ...

If the problem occurs, all programs trying to read or write from PF_FILE socket hangs up.
gnome-terminal hangs up on reading /tmp/.ICE-unix/xxxxx
sudo (run in working gnome-terminal) and login (while trying to login on tty1) hags on writing to /dev/log (also PF_FILE socket)

i don't think this is related to gnome-session or xorg. i don't know the launching mechanism, but once the problem triggered it is related to read/write of file sockets.

other programs not using PF_FILE sockets just works.
nothing helps except reboot :(

Ubuntu 9.04 (fully updated version by version since 6.10 or 7.04)

any suggestions? very annoying problem forcing me to reboot 5 time a day.

Revision history for this message
João Bertacchi (joaobertacchi) wrote :

I had a similar problem with Debian Lenny. I solved it changing /etc/hosts file.
I've changed the line "127.0.0.1 localhost" to "127.0.0.1 localhost <my_hostname>".

Revision history for this message
Alfonso Acosta (alfonso-acosta) wrote :

I am experiencing the same problem in a fresh amd64 installation of lucid. The bug persists after upgrading to maverick. I would paste the strace of gnome-session and X applications but is the same as the ones pasted above (apps get stuck reading from /tmp/.ICE-unix and gnome-session gets stuck in a call to poll)

Adding my local host-name to /etc/hosts did not help. Neither did disabling ipv6.

In my particular case, the problem only happens when connected to the internet through wifi (everything works while connected through my ethernet card or if I disable the wifi interface).

Revision history for this message
buguldey (buguldey) wrote (last edit ):
Download full text (25.2 KiB)

I still have this (or very similar) bug on Ubuntu 20.04.6 at March 2024. Researching it.

: strace firefox&
[...]
uname({sysname="Linux", nodename="desk1", ...}) = 0
connect(13, {sa_family=AF_UNIX, sun_path=@"/tmp/.ICE-unix/3585"}, 22) = 0
fcntl(13, F_SETFD, FD_CLOEXEC) = 0
write(13, "\0\1\0\0\0\0\0\0", 8) = 8
read(13, [hangs here]

: lsof | grep ICE|grep -v ICEBUG|grep -v libICE
COMMAND PID TID TASKCMD USER FD TYPE DEVICE SIZE/OFF NODE NAME
mate-sess 3585 user 14u unix 0x0000000000000000 0t0 49191 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 user 15u unix 0x0000000000000000 0t0 49192 /tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 user 20u unix 0x0000000000000000 0t0 45703 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 user 21u unix 0x0000000000000000 0t0 42672 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 user 22u unix 0x0000000000000000 0t0 47463 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 user 23u unix 0x0000000000000000 0t0 96699 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 user 24u unix 0x0000000000000000 0t0 50412 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 user 25u unix 0x0000000000000000 0t0 9376321 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 user 26u unix 0x0000000000000000 0t0 7924471 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 user 27u unix 0x0000000000000000 0t0 11296459 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 3610 mate-se:d user 14u unix 0x0000000000000000 0t0 49191 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 3610 mate-se:d user 15u unix 0x0000000000000000 0t0 49192 /tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 3610 mate-se:d user 20u unix 0x0000000000000000 0t0 45703 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 3610 mate-se:d user 21u unix 0x0000000000000000 0t0 42672 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 3610 mate-se:d user 22u unix 0x0000000000000000 0t0 47463 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 3610 mate-se:d user 23u unix 0x0000000000000000 0t0 96699 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 3610 mate-se:d user 24u unix 0x0000000000000000 0t0 50412 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 3610 mate-se:d user 25u unix 0x0000000000000000 0t0 9376321 @/tmp/.ICE-unix/3585 type=STREAM
mate-sess 3585 3610 mate-se:d user 26u unix 0x0000000000000000 0t0 7924471 @/tm...

Revision history for this message
buguldey (buguldey) wrote :

/var/log/Xorg.0.log

Revision history for this message
buguldey (buguldey) wrote :

lsof | grep ICE|grep -v ICEBUG > ~user/tmp/ICEBUG/lsof-ICE.log [this file attached]

Revision history for this message
buguldey (buguldey) wrote :
Download full text (5.3 KiB)

More details at bt:

: sudo apt install libice6-dbg
...
: gdb mate-terminal
...
Reading symbols from mate-terminal...
Reading symbols from /usr/lib/debug/.build-id/d8/a4bef96123f71cbc6a7ddfe696577b98810d23.debug...
(gdb) run
Starting program: /usr/bin/mate-terminal
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3345700 (LWP 551182)]
[New Thread 0x7ffff2b44700 (LWP 551183)]
[New Thread 0x7ffff2343700 (LWP 551184)]
[New Thread 0x7ffff1aa7700 (LWP 551185)]
^C
Thread 1 "mate-terminal" received signal SIGINT, Interrupt.
__libc_read (nbytes=8, buf=0x55555562dab0, fd=11) at ../sysdeps/unix/sysv/linux/read.c:26
26 ../sysdeps/unix/sysv/linux/read.c: Нет такого файла или каталога.
(gdb) thread apply all bt

Thread 5 (Thread 0x7ffff1aa7700 (LWP 551185)):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007ffff71b3556 in g_cond_wait_until () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007ffff7135581 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff7135ba6 in g_async_queue_timeout_pop () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4 0x00007ffff718f429 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007ffff718eae1 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 0x00007ffff6fae609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7 0x00007ffff6ed3353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7ffff2343700 (LWP 551184)):
#0 0x00007ffff6ec6bbf in __GI___poll (fds=0x7fffe4014e40, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007ffff716536e in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007ffff71656f3 in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff73bbf8a in () at /lib/x86_64-linux-gnu/libgio-2.0.so.0
#4 0x00007ffff718eae1 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007ffff6fae609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007ffff6ed3353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7ffff2b44700 (LWP 551183)):
#0 0x00007ffff6ec6bbf in __GI___poll (fds=0x555555645680, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007ffff716536e in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007ffff71654a3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff344d99d in () at /usr/lib/x86_64-linux-gnu/gio/modules/libdconfsettings.so
#4 0x00007ffff718eae1 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007ffff6fae609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007ffff6ed3353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7ffff3345700 (LWP 551182)):
#0 0x00007ffff6ec6bbf in __GI___poll (fds=0x5555556438e0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007ffff716536e in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007ffff71654a3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff71654f1 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4 0x...

Read more...

Revision history for this message
buguldey (buguldey) wrote :
Download full text (11.7 KiB)

Even more details (libsm6 + bt full):

: sudo apt install libsm6-dbgsym
...
: gdb mate-terminal
...
Reading symbols from mate-terminal...
Reading symbols from /usr/lib/debug/.build-id/d8/a4bef96123f71cbc6a7ddfe696577b98810d23.debug...
(gdb) run
Starting program: /usr/bin/mate-terminal
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3345700 (LWP 556105)]
[New Thread 0x7ffff2b44700 (LWP 556106)]
[New Thread 0x7ffff2343700 (LWP 556107)]
[New Thread 0x7ffff1aa7700 (LWP 556108)]
^C
Thread 1 "mate-terminal" received signal SIGINT, Interrupt.
__libc_read (nbytes=8, buf=0x55555562dab0, fd=11) at ../sysdeps/unix/sysv/linux/read.c:26
26 ../sysdeps/unix/sysv/linux/read.c: Нет такого файла или каталога.
(gdb) thread apply all bt full

Thread 5 (Thread 0x7ffff1aa7700 (LWP 556108)):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007ffff71b3556 in g_cond_wait_until () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007ffff7135581 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff7135ba6 in g_async_queue_timeout_pop () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4 0x00007ffff718f429 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007ffff718eae1 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 0x00007ffff6fae609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737247868672, -1123668806880414095, 140737488341918, 140737488341919, 140737488342064, 140737247865408, 1123664018334332529, 1123649042188252785}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#7 0x00007ffff6ed3353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7ffff2343700 (LWP 556107)):
#0 0x00007ffff6ec6bbf in __GI___poll (fds=0x7fffe4014e40, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
        resultvar = 18446744073709551100
        sc_cancel_oldtype = 0
#1 0x00007ffff716536e in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007ffff71656f3 in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff73bbf8a in () at /lib/x86_64-linux-gnu/libgio-2.0.so.0
#4 0x00007ffff718eae1 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007ffff6fae609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737256896256, -1123668806880414095, 140737265284558, 140737265284559, 140737265284704, 140737256892992, 1123658295290410609, 1123649042188252785}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#6 0x00007ffff6ed3353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7ffff2b44700 (LWP 556106)):
#0 0x00007ffff6ec6bbf in __GI___poll (fds=0x555555645680, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
        resultvar =...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.