[alpha3] Re-registered NC fails to be detected.

Bug #530091 reported by Torsten Spindler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Fix Released
Medium
Thierry Carrez

Bug Description

I installed a frontend and node with the alpha3 server CD. Frontend works fine, but
$ euca-describe-availability-zones verbose
does not have any node. I tried
$ euca_conf --discover-nodes --no-rsync
but the node was not discovered. I added it then by hand to /etc/eucalyptus/eucalyptus.conf and copied the keys manually. However, the node is still not part of the cloud. On the node I see the following eucalyptus processes:

ubuntu@node01:~$ ps aux | grep eucalyptus
root 1211 0.0 0.0 4972 2176 ? Ss 15:52 0:00 apache2 -f /var/run/eucalyptus/httpd-nc.conf -D FOREGROUND
108 1260 0.0 0.0 45288 3988 ? Sl 15:52 0:00 apache2 -f /var/run/eucalyptus/httpd-nc.conf -D FOREGROUND
root 1444 0.0 0.0 2232 1004 ? Ss 15:52 0:00 avahi-publish -s torstentest node _eucalyptus._tcp 8775 txtvers=1 protovers=1.5.0 type=node

Revision history for this message
Thierry Carrez (ttx) wrote :

There are some changes in the nodes registration process. It should now work automatically and not require "discover-nodes". You can look into the CC's /var/log/eucalyptus/registration.log if it detected the NC and (if yes) what the euca_conf --register-nodes command returned.

From your last attempts, it looks like the key from the CC was not distributed to the /var/lib/eucalyptus/.ssh/authorized_keys on the NC during the install process. Could you check the contents of that file on the NC ?

Was the rest of the NC install preseeded ? Or did you have to manually enter username/password and other install details ? I suspect you started the NC install too early and no preseed was yet available from the CC.

Changed in eucalyptus (Ubuntu):
status: New → Incomplete
Revision history for this message
Torsten Spindler (tspindler) wrote :

I'm pretty sure I waited for the front-end to be ready, e.g. euca-describe-availability-zones verbose returns a good output. I attach the registration log from the front-end.

On the node the authorized_keys file contain a good key for the frontend. As user eucalyptus on the front-end I can do a password less login to node01 with ssh.

I can re-install the node and see if it gets any better.

$ sudo cat authorized_keys
[sudo] password for ubuntu:
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAt9C17TNGYpqIyCT74LtzXE1fpVluKGCIql8HBufmux7a5/AdVqa+b4OMs+bkNbQRPJiaUmGKyRioHX9vZwngN8FlDxI35QG5keEd/flI0ltXghnOVBoHXh9QVc2ux78GzAu+u0bxI9En4DfETvidgTcVmHNSUJlT270oQX7JiXj0bfK87S/d5vzA4pZInODFYilX+RHCgaZocgYkYP2cGqH2hFR2KSmbzVgkeV0Axk9FQAMSmrrPwrnenYmC9oobo0LQ8ZUp/1STruQVxWkLph8wfPY0JRrw5PYmsYIf8hys9t5vqhNCKUs29tCJDaSy/HdcXYB/GwgFgSjkSaPE5Q== eucalyptus@frontend
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAt9C17TNGYpqIyCT74LtzXE1fpVluKGCIql8HBufmux7a5/AdVqa+b4OMs+bkNbQRPJiaUmGKyRioHX9vZwngN8FlDxI35QG5keEd/flI0ltXghnOVBoHXh9QVc2ux78GzAu+u0bxI9En4DfETvidgTcVmHNSUJlT270oQX7JiXj0bfK87S/d5vzA4pZInODFYilX+RHCgaZocgYkYP2cGqH2hFR2KSmbzVgkeV0Axk9FQAMSmrrPwrnenYmC9oobo0LQ8ZUp/1STruQVxWkLph8wfPY0JRrw5PYmsYIf8hys9t5vqhNCKUs29tCJDaSy/HdcXYB/GwgFgSjkSaPE5Q== eucalyptus@frontend

Revision history for this message
Thierry Carrez (ttx) wrote :

Everything looks good on the install side... Maybe try sudo euca_conf --deregister-nodes "IP" && sudo euca_conf --register-nodes "IP" and see if you get any error ?

If it succeeds, it's probably an issue on the NC side, do you get anything in the NC logs ?

Revision history for this message
Torsten Spindler (tspindler) wrote : Re: [Bug 530091] Re: [alpha3] NC fails to be detected.

I tried the deregister and register but not change in the overall
situation.

For the logs on the node controller, I don't see nc.log:

ubuntu@node01:/var/log/eucalyptus$ ls
axis2c.log euca_test_nc.log httpd-nc_error_log

Revision history for this message
Thierry Carrez (ttx) wrote : Re: [alpha3] NC fails to be detected.

Anything in euca_test_nc.log or httpd-nc_error_log ?

Revision history for this message
Torsten Spindler (tspindler) wrote : Re: [Bug 530091] Re: [alpha3] NC fails to be detected.

Nothing of interest in there, I attach the two

Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [alpha3] NC fails to be detected.

I'm confirming this.

I have a new UEC setup this morning from the current archive. CLC+WC+CC+SC, and 4xNC.

The CLC was definitely up and running and serving the preseed.conf before the NCs were installed.

The NCs installed correctly, and they have the CLC's ssh key in /var/lib/eucalyptus/.ssh/authorized_keys.

But none of them are registering automatically.

I think I've seen this quite a bit around Alpha3, and when I mentioned it, we chalked it up to preseed/netboot/timing issues.

In any case, this is non ideal. And it looks like a regression to me, as this was working very, very well in Portland.

Changed in eucalyptus (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → High
Changed in eucalyptus (Ubuntu):
assignee: nobody → Dustin Kirkland (kirkland)
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

I believe I've tracked down where this problem was introduced:

http://bazaar.launchpad.net/~ubuntu-core-dev/eucalyptus/ubuntu/revision/909

My nodes don't have [ -f "/etc/eucalyptus/eucalyptus-nc.conf" ], so the publication job isn't starting.

This is because the debconf key/value for eucalyptus/cluster-name does not exist on my NC.
  $ echo GET eucalyptus/cluster-name | sudo debconf-communicate

And thus the eucalyptus-nc.postinst isn't able to populate that file.

I'm working on a fix.

Revision history for this message
Thierry Carrez (ttx) wrote :

@Dustin: In Torsten's logs, the publication is working alright, since the node is detected on the CC's registration logs:

2010-02-26 15:10:29+01:00 | 2758 -> Calling node torstentest node 192.168.1.106
2010-02-26 15:10:30+01:00 | 2758 -> euca_conf --register-nodes returned 0

Also eucalyptus/cluster-name is normally present in the CC preseed ?

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Thierry,

Agreed. My issue was actually different. I filed this under: Bug #530937, and I have committed a fix for my issue to the tree.

I'm not sure what's going on with Torsten's issue. I'm going to unassign myself from this bug for now.

Changed in eucalyptus (Ubuntu):
assignee: Dustin Kirkland (kirkland) → nobody
Revision history for this message
Torsten Spindler (tspindler) wrote :

I confirm that on my cloud there is an eucalyptus-nc.conf on the node and it reads
CC_NAME="torstentest"

I will reinstall the node controller next and see if the problem persists.

Revision history for this message
Torsten Spindler (tspindler) wrote :

After reinstallation nothing changes. I removed quiet and splash from the kernel boot command line and see the following report:
init: eucalyptus-network (lo) main process (704) killed by TERM signal

Revision history for this message
Thierry Carrez (ttx) wrote :

Looking at the authorized_keys, might be a node re-registration issue. Did you register the same Node IP with the CC in the past ? When you deregistered/registered manually the node, did you get any error ? After deregister, what do you have in /var/lib/eucalyptus/nodes.list ? Could you reproduce on a full new setup (reinstall the CC)? (I can't)

The process is:
0/ NC Installer copies CC key to NC authorized_keys
1/ NC publishes its existence
2/ CC picks up the publication
3/ CC runs euca_conf --register-nodes
4/ euca_conf --register-nodes syncs up the eucalyptus CC keys to the NC /var/lib/eucalyptus/keys
5/ euca_conf --register-nodes adds IP to the CC nodes.list
6/ NC picks up keys and starts up, starts writing up messages to nc.log

From your logs it looks like 0-3 is working alright, and that 6 never happens. Could you tell where it stops ?

Changed in eucalyptus (Ubuntu):
importance: High → Medium
status: Confirmed → Incomplete
Revision history for this message
Torsten Spindler (tspindler) wrote : Re: [Bug 530091] Re: [alpha3] NC fails to be detected.

On Wed, 2010-03-03 at 09:48 +0000, Thierry Carrez wrote:
> Looking at the authorized_keys, might be a node re-registration issue.
> Did you register the same Node IP with the CC in the past ?

Yes, it was registered before.

> When you
> deregistered/registered manually the node, did you get any error ?

Nope. But when I register the node it is not listed anywhere, e.g.

$ sudo euca_conf --register-nodes 192.168.1.106

INFO: We expect all nodes to have eucalyptus installed
in //var/lib/eucalyptus/keys for key synchronization.

Trying rsync to sync keys with "192.168.1.106"...The authenticity of
host '192.168.1.106 (192.168.1.106)' can't be established.
RSA key fingerprint is 5e:3f:83:e2:83:18:c5:f9:04:3f:a1:f2:9a:94:30:4e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.1.106' (RSA) to the list of known
hosts.
eucalyptus@192.168.1.106's password:
done.

ubuntu@frontend:/etc/eucalyptus$ grep -ri 106 *
grep: preseed/preseed.conf: Permission denied

> After
> deregister, what do you have in /var/lib/eucalyptus/nodes.list ?

sudo euca_conf --deregister-nodes 192.168.1.106
[sudo] password for ubuntu:
SUCCESS: removed node '192.168.1.106'
ubuntu@frontend:~$ cd /etc/eucalyptus/
ubuntu@frontend:/etc/eucalyptus$ grep 106 *
eucalyptus.conf:NODES="192.168.1.106"

I added it manually to eucalyptus.conf in the past, will remove it.

> Could
> you reproduce on a full new setup (reinstall the CC)? (I can't)

I will do so, first reinstall the CC, then the NC and report back if the
situation changes.

> The process is:
> 0/ NC Installer copies CC key to NC authorized_keys
> 1/ NC publishes its existence
> 2/ CC picks up the publication
> 3/ CC runs euca_conf --register-nodes
> 4/ euca_conf --register-nodes syncs up the eucalyptus CC keys to the NC /var/lib/eucalyptus/keys
> 5/ euca_conf --register-nodes adds IP to the CC nodes.list
> 6/ NC picks up keys and starts up, starts writing up messages to nc.log
>
> >From your logs it looks like 0-3 is working alright, and that 6 never
> happens. Could you tell where it stops ?
>
> ** Changed in: eucalyptus (Ubuntu)
> Importance: High => Medium
>
> ** Changed in: eucalyptus (Ubuntu)
> Status: Confirmed => Incomplete
>

Revision history for this message
Thierry Carrez (ttx) wrote : Re: [alpha3] NC fails to be detected.

I think there is still an issue around re-registration of nodes, I did fall into that hole once. We just need to reproduce it to pinpoint where it comes from. Please confirm that it works for you on a fully-new install, so that we can rename that bug "Re-registered NC fails to be detected" :)

Revision history for this message
Torsten Spindler (tspindler) wrote :

On a fresh install of front-end and node controller the cloud works as expected:

ubuntu@frontend:/var/log/eucalyptus$ euca-describe-availability-zones verbose
AVAILABILITYZONE torsten 192.168.1.103
AVAILABILITYZONE |- vm types free / max cpu ram disk
AVAILABILITYZONE |- m1.small 0002 / 0002 1 128 2
AVAILABILITYZONE |- c1.medium 0002 / 0002 1 256 5
AVAILABILITYZONE |- m1.large 0001 / 0001 2 512 10
AVAILABILITYZONE |- m1.xlarge 0001 / 0001 2 1024 20
AVAILABILITYZONE |- c1.xlarge 0000 / 0000 4 2048 20

Thierry Carrez (ttx)
summary: - [alpha3] NC fails to be detected.
+ [alpha3] Re-registered NC fails to be detected.
Changed in eucalyptus (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

One thing we noticed in debian/registration/node:

# Check if node isn't already registered
. /etc/eucalyptus/eucalyptus.conf
for nip in "$NODES"; do
  if [ "${nip# }" == "${IP}" ]; then
    reglog "Node $IP is already registered."
    exit 1
  fi
done

This code doesn't support /var/lib/eucalyptus/nodes.list, but it seems like it should ...

Changed in eucalyptus (Ubuntu):
assignee: nobody → Thierry Carrez (ttx)
Revision history for this message
Daniel Nurmi (nurmi) wrote :

It looks like the problem is related to the fact that euca_conf --deregister will de-register a node, but the uec_component_listener is not informed when de-registration happens. Thus, if the component listener is restarted, it will re-register the node as soon as it sees the avahi publication of the node.

One possible avenue here would be to modify euca_conf in the UEC to send a signal of some sort (perhaps, by putting a message into registration.log, which uec_component_listener reads periodically?), to inform the listener that a node has been de-registered and should not be re-registered.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 1.6.2-0ubuntu23

---------------
eucalyptus (1.6.2-0ubuntu23) lucid; urgency=low

  * debian/eucalyptus-udeb.postinst, debian/eucalyptus-udeb.templates:
    add a debconf/preseed option to skip the euca_find_component
    checks in the installer
  * debian/eucalyptus-network.upstart: only rewrite ipaddr.conf if
    it does not exist (admin can force a rewrite by removing it);
    only write the pertinent addrs to ipaddr.conf, LP: #523126
  * debian/registration/node: ensure that nodes.list is used in
    building the $NODES ip list, might (in part) solve LP: #530091
  * debian/rules: drop the install-init --noscripts option, as this
    is not what we want and appears to have arrived from a bad
    copy-n-paste; this fix ensures that eucalyptus-nc is started on
    on package install, LP: #545606
 -- Dustin Kirkland <email address hidden> Wed, 24 Mar 2010 18:10:05 -0700

Changed in eucalyptus (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.