localhost connection timeouts after start of eucalyptus

Bug #510086 reported by Stephane Chazelas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Fix Released
Medium
Daniel Nurmi

Bug Description

This is on ubuntu karmic server.

After the starting of eucalyptus (sudo start eucalyptus), any TCP connection attempt on the loopback interface (the connect(2) system call) to a port that has no listener hangs instead of returning immediately with ECONNREFUSED.

The problem seems due to a rule added upon startup in the "nat" iptable:

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
  801 48085 MASQUERADE all -- any any anywhere !172.19.0.0/16

That masquerades every connection even those locally generated. It could have other side effects. But the one that causes connection hangs is quite noticeable and affects many services.

It could also be a kernel bug, because looking at the pcap traces upon a "telnet localhost":

2997.869330 10.10.10.38 -> 127.0.0.1 TCP 35140 > telnet [SYN] Seq=0 Win=32792 Len=0 MSS=16396 TSV=6901389 TSER=0 WS=7 12:43
2997.869351 127.0.0.1 -> 127.0.0.1 TCP telnet > 35140 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0

and we see retransmissions of that until the connect(2) timesout. While if there's someone listening:

 3432.999156 10.10.10.38 -> 127.0.0.1 TCP 57717 > telnet [SYN] Seq=0 Win=32792 Len=0 MSS=16396 TSV=6944902 TSER=0 WS=7 12:55
3432.999183 127.0.0.1 -> 127.0.0.1 TCP telnet > 57717 [SYN, ACK] Seq=0 Ack=0 Win=32768 Len=0 MSS=16396 TSV=6944902 TSER=6944902 WS=7
3432.999203 10.10.10.38 -> 127.0.0.1 TCP 57717 > telnet [ACK] Seq=1 Ack=1 Win=32896 Len=0 TSV=6944902 TSER=6944902
3432.999366 10.10.10.38 -> 127.0.0.1 TELNET Telnet Data ...
3432.999384 127.0.0.1 -> 127.0.0.1 TCP telnet > 57717 [ACK] Seq=1 Ack=24 Win=256 Len=0 TSV=6944902 TSER=6944902

It's still masqueraded, but the connection goes through.

Also, I don't like the fact that the whole iptables conf is wiped out as soon as "eucalyptus" is started. (note that the UEC default installation installs ufw whose configuration is wiped that way).

Those tables are installed via a call to iptables-restore on a file generated on the fly:

root 1374 1 0 17:33 ? 00:00:00 apache2 -f /var/run/eucalyptus/httpd-cc.conf -D FOREGROUND
107 1420 1374 0 17:33 ? 00:00:00 apache2 -f /var/run/eucalyptus/httpd-cc.conf -D FOREGROUND
107 3497 1420 0 17:34 ? 00:00:00 sh -c ///usr/lib/eucalyptus/euca_rootwrap iptables-restore < /tmp/euca-ipt-WF6Jg9
root 3498 3497 0 17:34 ? 00:00:00 /bin/sh - /sbin/iptables-restore

(it's called several times), upon some POST http://10.10.10.38:8774/axis2/services/EucalyptusCC HTTP/1.1 request issues by I don't what.

$ uname -srvm
Linux 2.6.31-17-server #54-Ubuntu SMP Thu Dec 10 18:06:56 UTC 2009 x86_64
$ dpkg -l | grep euca
ii euca2ools 1.0+bzr20091007-0ubuntu1.1 managing cloud instances for Eucalyptus
ii eucalyptus-cc 1.6~bzr931-0ubuntu7.4 Elastic Utility Computing Architecture - Clu
ii eucalyptus-cloud 1.6~bzr931-0ubuntu7.4 Elastic Utility Computing Architecture - Clo
ii eucalyptus-common 1.6~bzr931-0ubuntu7.4 Elastic Utility Computing Architecture - Com
ii eucalyptus-gl 1.6~bzr931-0ubuntu7.4 Elastic Utility Computing Architecture - Log
ii eucalyptus-java-common 1.6~bzr931-0ubuntu7.4 Elastic Utility Computing Architecture - Com
ii eucalyptus-sc 1.6~bzr931-0ubuntu7.4 Elastic Utility Computing Architecture - Sto
ii eucalyptus-walrus 1.6~bzr931-0ubuntu7.4 Elastic Utility Computing Architecture - Wal
ii libeucalyptus-commons-ext-java 0.4.2-0ubuntu1 Eucalyptus commons external Java library

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Dan- You're the routing/iptables expert here... Any idea what's going on? Is this something we can solve?

Changed in eucalyptus (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Daniel Nurmi (nurmi)
Revision history for this message
Daniel Nurmi (nurmi) wrote :

All,

I'm unable to reproduce this problem with (pre) Eucalyptus 1.6.2; with a running active CC in MANAGED mode (which has the masq rule in place), I can telnet to localhost port 22 (for example) without a problem, and sshing to localhost shows that I'm logged in from 'localhost' (implying that there is no nat going on).

My understanding of current iptables implementations is that, for lo, the POSTROUTING chain is not traversed (i.e. there is a special path just for lo), which incidentally why the CC has to install DNAT rules in the OUTPUT chain, which is traversed when traffic originates from lo).

Revision history for this message
Stephane Chazelas (stephane-chazelas) wrote : Re: [Bug 510086] Re: localhost connection timeouts after start of eucalyptus

2010-01-30 02:37:08 -0000, Daniel Nurmi:
[...]
> I'm unable to reproduce this problem with (pre) Eucalyptus 1.6.2; with a
> running active CC in MANAGED mode (which has the masq rule in place), I
> can telnet to localhost port 22 (for example) without a problem, and
> sshing to localhost shows that I'm logged in from 'localhost' (implying
> that there is no nat going on).
[...]

Hi Daniel,

the thing is, it's for ports where there's no listener that
there's problem. Obviously, you've got a listener here as you
managed to ssh.

Try: telnet localhost 2
for instance (assuming you've got no service on port 2), you'll
see telnet hanging instead of returning with a "connection
refused" error message (that is, if you can reproduce the bug,
but I'm quite confident that it's easily reproducible as it
happened after a fresh install of Ubuntu Cloud with default
parameters)

Best regards,
Stephane

Revision history for this message
Daniel Nurmi (nurmi) wrote :

We've modified the MASQ rule to use source !127.0.0.0/8, which resolves this problem (in Lucid)

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Marking fix-released against Lucid. Please reopen if you can reproduce this against the latest Lucid code.

Changed in eucalyptus (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.