localhost connection timeouts after start of eucalyptus
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
eucalyptus (Ubuntu) |
Fix Released
|
Medium
|
Daniel Nurmi |
Bug Description
This is on ubuntu karmic server.
After the starting of eucalyptus (sudo start eucalyptus), any TCP connection attempt on the loopback interface (the connect(2) system call) to a port that has no listener hangs instead of returning immediately with ECONNREFUSED.
The problem seems due to a rule added upon startup in the "nat" iptable:
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
801 48085 MASQUERADE all -- any any anywhere !172.19.0.0/16
That masquerades every connection even those locally generated. It could have other side effects. But the one that causes connection hangs is quite noticeable and affects many services.
It could also be a kernel bug, because looking at the pcap traces upon a "telnet localhost":
2997.869330 10.10.10.38 -> 127.0.0.1 TCP 35140 > telnet [SYN] Seq=0 Win=32792 Len=0 MSS=16396 TSV=6901389 TSER=0 WS=7 12:43
2997.869351 127.0.0.1 -> 127.0.0.1 TCP telnet > 35140 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
and we see retransmissions of that until the connect(2) timesout. While if there's someone listening:
3432.999156 10.10.10.38 -> 127.0.0.1 TCP 57717 > telnet [SYN] Seq=0 Win=32792 Len=0 MSS=16396 TSV=6944902 TSER=0 WS=7 12:55
3432.999183 127.0.0.1 -> 127.0.0.1 TCP telnet > 57717 [SYN, ACK] Seq=0 Ack=0 Win=32768 Len=0 MSS=16396 TSV=6944902 TSER=6944902 WS=7
3432.999203 10.10.10.38 -> 127.0.0.1 TCP 57717 > telnet [ACK] Seq=1 Ack=1 Win=32896 Len=0 TSV=6944902 TSER=6944902
3432.999366 10.10.10.38 -> 127.0.0.1 TELNET Telnet Data ...
3432.999384 127.0.0.1 -> 127.0.0.1 TCP telnet > 57717 [ACK] Seq=1 Ack=24 Win=256 Len=0 TSV=6944902 TSER=6944902
It's still masqueraded, but the connection goes through.
Also, I don't like the fact that the whole iptables conf is wiped out as soon as "eucalyptus" is started. (note that the UEC default installation installs ufw whose configuration is wiped that way).
Those tables are installed via a call to iptables-restore on a file generated on the fly:
root 1374 1 0 17:33 ? 00:00:00 apache2 -f /var/run/
107 1420 1374 0 17:33 ? 00:00:00 apache2 -f /var/run/
107 3497 1420 0 17:34 ? 00:00:00 sh -c ///usr/
root 3498 3497 0 17:34 ? 00:00:00 /bin/sh - /sbin/iptables-
(it's called several times), upon some POST http://
$ uname -srvm
Linux 2.6.31-17-server #54-Ubuntu SMP Thu Dec 10 18:06:56 UTC 2009 x86_64
$ dpkg -l | grep euca
ii euca2ools 1.0+bzr20091007
ii eucalyptus-cc 1.6~bzr931-
ii eucalyptus-cloud 1.6~bzr931-
ii eucalyptus-common 1.6~bzr931-
ii eucalyptus-gl 1.6~bzr931-
ii eucalyptus-
ii eucalyptus-sc 1.6~bzr931-
ii eucalyptus-walrus 1.6~bzr931-
ii libeucalyptus-
Dan- You're the routing/iptables expert here... Any idea what's going on? Is this something we can solve?