Comment 25 for bug 58171

Revision history for this message
Bryce Harrington (bryce) wrote :

Hmm, not sure what's going on. It might help to run xterm with its TRACE messages flipped on. Here's a package with this done:

  http://people.ubuntu.com/~bryce/Testing/ICE-unix/

I also encourage everyone experiencing this issue to install debug packages of gnome-session, libice, libgnomeui, and libgnome as a minimum.

The fact that it's hanging right after establishing a connection via .ICE-unix seems like the strongest clue we have so far. On my system (which is working properly), here is what has this open, for your comparison:

bryce@chideok:~/src/xorg-server/xorg-server-1.4.1~git20080131-patched$ lsof | grep ICE
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
x-session 6021 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
x-session 6021 bryce 22u unix 0xdf900540 14732 /tmp/.ICE-unix/6021
x-session 6021 bryce 27u unix 0xf7317700 15228 /tmp/.ICE-unix/6021
x-session 6021 bryce 28u unix 0xf0e2e700 15324 /tmp/.ICE-unix/6021
x-session 6021 bryce 29u unix 0xf0e3d000 15336 /tmp/.ICE-unix/6021
x-session 6021 bryce 30u unix 0xf0e67540 15949 /tmp/.ICE-unix/6021
x-session 6021 bryce 31u unix 0xf0deea80 15950 /tmp/.ICE-unix/6021
x-session 6021 bryce 32u unix 0xf0ddc000 16164 /tmp/.ICE-unix/6021
x-session 6021 bryce 33u unix 0xf0f6b1c0 16656 /tmp/.ICE-unix/6021
x-session 6021 bryce 34u unix 0xf0fe0700 17015 /tmp/.ICE-unix/6021
seahorse- 6100 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
gnome-set 6109 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
gnome-scr 6139 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
metacity 6144 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
nautilus 6147 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
update-no 6158 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
evolution 6167 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
gnome-vol 6174 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
gnome-pow 6177 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
trashappl 6206 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
mixer_app 6242 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
deskbar-a 6245 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
fast-user 6248 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0
notificat 6284 bryce mem REG 8,1 86160 48838478 /usr/lib/libICE.so.6.3.0

The straces unfortunately are far too low level to give much insight into where the failure is occurring, other than indicating it's happening during an ICE network negotiation. What would really be useful in troubleshooting this would be detailed backtraces. The backtrace in comment #5 is a starting point, but I'd like to see a '(gdb) backtrace full' on one of the stuck processes. See https://wiki.ubuntu.com/DebuggingXorg for some tips.

Anyway, even though #5's backtrace is limited, it does give some clues. First, in browsing through the ICE code, the failure is happening extremely early on in the ICE negotiation - evidently right at the point of trying to establish the connection in IceOpenConnection. This suggests that perhaps the connection has become invalid since the last ICE call.

At this point, "something to do with avahi" does sound like the most plausible line of investigation. The working hypothesis based on the accumulated comments so far, would be that the network connection flakes out, avahi takes over but puts things into an invalid state (perhaps due to ipv6, perhaps other reasons), and gnome-session/libice gets confused trying to connect to the ICE socket, and fails early on in the negotiating process.

So, it would be good for people to collect backtraces and compare with comment #5 to see if their issue is hanging with the same sort of calls; if not, we might be dealing with multiple unrelated bugs.

I don't know much about avahi, but one path of investigation with that might be to search the debian BTS, Xorg bugzilla, and avahi's upstream bug tracker for issues involving avahi and libICE. Would someone mind doing this?

A second path to investigate would be to do some network experimentations, like disabling your wireless router or otherwise force your system's wireless network to drop, let avahi come on, and start an x client to see if you can trigger the bug deliberately that way. Then see if you can disable avahi (stop avahi-daemon), then repeat the experiment and see if the .ICE-unix error still occurs or not when starting x clients. Depending on how this experiment turns out, the next step would be to contact the avahi developers about a fix (maybe there's a patch already on their mailing list?)

A third path would be to set up a system *properly* configured with working avahi, and see if the problem occurs there as well. If it does, then the fault may lay with gnome-session, rather than avahi. Perhaps gnome-session or libice get confused by ipv6 or something - if it is possible to run avahi with ipv4, it might be of interest to repeat this experiment alternating between ipv4 and ipv6 to see if the issue occurs only on the latter.