gdm hangs altogether after timeout on the gdm socket
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
gdm |
Fix Released
|
Critical
|
|||
gdm (Ubuntu) |
Fix Released
|
High
|
Ubuntu Desktop Bugs | ||
Dapper |
Invalid
|
High
|
Unassigned |
Bug Description
Binary package hint: gdm
Since upgrade to dapper final, we experience frequent breakage of gdm on an amd64 (64-bit dapper) XDMCP server serving regularly around 40 clients.
Reproducibility: happens when e.g. many people log out at the same time (once in a few days), gdm must be killed and started manually afterwards (killing all the existing sessions as well).
Symptoms: slave gdm processe continue to work, but the main gdm process does not spawn new slaves, it does not ping existing ones every 15s as it does normally (from the debug syslog), does not repond to TERM (must be KILLed) - as if it were waiting for something (race?).
Logs reveal the only difference between normal situation and the bug in the timeout on the gdm socket (I will attach the full log):
Sep 22 14:16:40 [gdm] Sending LOGGED_IN == 0 for slave 317
Sep 22 14:16:40 [gdm] Timeout occurred for sending message LOGGED_IN 317 0
What might be the reason? In slave.c:
&rfds), but select apparently return error, since the timeout never expires (otherwise, it would have to take 10s between the message sending and the timeout).
PS. I compiled gdm with an added line for tracing the message sending and will post results if they are relevant. (daemon/slave.c):
@@ -2767,6 +2766,7 @@
if (in_usr2_signal > 0) {
+ int select_retval;
@@ -2775,9 +2775,10 @@
- if (select (d->slave_
+ if ((select_retval = select (d->slave_
+ if (select_retval < 0) gdm_debug("TRACE (%s,%d): select returned errno %d (%s)",_
} else {
@@ -2787,6 +2788,7 @@
}
+ gdm_debug ("TRACE (%s,%d): Passed gdm_slave_send cycle, i=%d, in_usr2_signal=%d, wait_for_ack=%d, gdm_got_
}
if G_UNLIKELY (wait_for_ack &&
Changed in gdm: | |
status: | Unknown → Unconfirmed |
Changed in gdm: | |
status: | Unconfirmed → Rejected |
Changed in gdm: | |
status: | Rejected → Fix Released |
Changed in gdm: | |
importance: | Unknown → Critical |
This is the log from where the error shows up, up to restart (happened friday afternoon, restarted monday). Note that the pinging from the main gdm process is not present anymore. The "Fatal X error detected." is present even under normal conditions and hence is not cause of the bug.