[maverick] Inconsistent certificates prevent CC to start correctly (no cc.log)

Bug #627963 reported by Thierry Carrez
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Eucalyptus
Fix Released
Undecided
Unassigned
eucalyptus (Ubuntu)
Fix Released
High
Dave Walker
Maverick
Fix Released
High
Dave Walker

Bug Description

Maverick/20100831/amd64 beta candidate. Topology 1 (CLC/CC/SC/Walrus on the same machine)

Registration appears to work correctly (euca_conf --register-* returns 0 (SUCCESS)). However, the nodes can't register (with symptoms in bug 585108). Looking deeper, cloud-cluster.log is empty, and cloud-output.log complains every 20 sec about an expected certificate being different from a received certificate (see attached file).

I reproduced it on two installs.

Revision history for this message
Thierry Carrez (ttx) wrote :
Revision history for this message
Thierry Carrez (ttx) wrote :

Setting to critical since it prevents usage of Eucalyptus completely, I know not of any manual workaround, and it seems reproducible.

Changed in eucalyptus (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Thierry Carrez (ttx) wrote :

Will try on i386 since that was working on Friday...

tags: added: iso-testing
Revision history for this message
Thierry Carrez (ttx) wrote :

The symptoms in bug 585108 can also be observed in this case, but I suspect this is just a consequence of not having the CC started correctly.

Revision history for this message
Thierry Carrez (ttx) wrote :

Did not reproduce in my i386 test. Might be pure luck, or truly arch-related.

Revision history for this message
Thierry Carrez (ttx) wrote :

On systems exhibiting the bug: cluster registration appears to be successful (euca_conf --list-clusters will show up a cluster), but no cc.log is produced.

Revision history for this message
C de-Avillez (hggdh2) wrote :

I just tested it (and will re-test, now with a bit more of objectivity). It seems Eucalyptus (2.0~bzr1233-0ubuntu2) is storing certificates on two different places, at least on a topology 1:

/var/lib/eucalyptus/keys for the CLC (and, probably, the Walrus, perhaps also the SC)
/var/lib/eucalyptus/keys/<Cluster Name> for the CC

And so, when trying to to open a session, the CLC cannot verify the CC...

I copied the ./<Cluster Name> certificates everywhere, and was able to work.

Changed in eucalyptus (Ubuntu):
status: New → Confirmed
Revision history for this message
C de-Avillez (hggdh2) wrote :

Although I did it the hard way, I am almost sure that copying the CLC certs over on the ./<Cluster Name> would do the trick.

Revision history for this message
Thierry Carrez (ttx) wrote :

Workaround:
sudo cp /var/lib/eucalyptus/keys/<clustername>/* /var/lib/eucalyptus/keys/
sudo service eucalyptus stop; sleep 10
sudo service eucalyptus start

Changed in eucalyptus (Ubuntu):
importance: Critical → High
Revision history for this message
Thierry Carrez (ttx) wrote :

On two consecutive amd64 installs, Carlos had one failing, then one working. So it does not seem to be arch-related, just some kind of race that i386 happens to win more often than amd64.

Thierry Carrez (ttx)
summary: - CC doesn't start correctly, CLC struggles with certificates
+ [maverick] Inconsistent certificates prevent CC to start correctly (no
+ cc.log)
Revision history for this message
Dave Walker (davewalker) wrote :

With three installs in top1:
1) I believe i saw this issue, but didn't keep the install long enough to be exactly sure.
2) I reproduced this issue
3) Did not reproduce this issue, worked as expected.

Certainly seems like a race, or other inconsistency.

Revision history for this message
Dave Walker (davewalker) wrote :

^^ This was performed on amd64 of the current beta candidate.

Revision history for this message
Dmitrii Zagorodnov (dmitrii) wrote :

Directory /var/lib/eucalyptus/keys/<Cluster Name> is only relevant on the CLC, since it needs to have keys for multiple clusters. For all other components the keys are in /var/lib/eucalyptus/keys. Hence, if CLC and a CC are co-located, you will see the same keys in two places.

Cloud key is created when CLC starts for the first time. Cluster and node keys are created (by the CLC) when the cluster is registered. Eucalyptus version of euca_conf --register-cluster or --register-node attempts to ensure that the keys for two hosts match by using rsync and rcp. UEC version of the file attempts to do the same but with somewhat different code.

I am not sure how one ends up with mismatched keys on a fresh install. (If synchronization didn't work you would have no keys on the downstream side, such as CC or NC.) If you run into this again, I would look at modification times and checksums of all *.pem files under the /var/lib/eucalyptus/keys tree.

Revision history for this message
C de-Avillez (hggdh2) wrote :

I have just reinstalled from scratch (i.e., full Ubuntu install + Eucalyptus all-in-one). I just installed the CLC+Walrus+CC/SC.

Upon logging in, I verified the /var/lib/eucalyptus/keys/*.pem to be the same as /var/lib/eucalyptus/keys/UEC-TEST1/*.pem. CC was up and running and on certificate errors.

I then rebooted. After logging in I found the ./UEC-TEST1/*.pem to have been regenned, and I now have the certificate error.

Full logs have been uploaded to bzr: /bazaar.launchpad.net/~hggdh2/+junk/uec-qa/, revision 48.

I will now check the workaround.

Revision history for this message
C de-Avillez (hggdh2) wrote :

er. Second paragraph above: "CC was up and running and *NO* certificate errors."

Revision history for this message
C de-Avillez (hggdh2) wrote :

ubuntu@cempedak:/var/lib/eucalyptus$ sudo ls -lR keys
keys:
total 64
-rw-r--r-- 1 eucalyptus eucalyptus 1147 2010-09-01 12:56 cloud-cert.pem
-rw-r--r-- 1 eucalyptus eucalyptus 1675 2010-09-01 12:55 cloud-pk.pem
-rw-r--r-- 1 eucalyptus eucalyptus 1151 2010-09-01 12:56 cluster-cert.pem
-rw-r--r-- 1 eucalyptus eucalyptus 1675 2010-09-01 12:56 cluster-pk.pem
-rw-r--r-- 1 eucalyptus eucalyptus 25476 2010-09-01 12:56 euca.p12
-rwxr-xr-x 1 eucalyptus eucalyptus 2841 2010-08-25 18:06 nc-client-policy.xml
-rw-r--r-- 1 eucalyptus eucalyptus 1151 2010-09-01 12:56 node-cert.pem
-rw-r--r-- 1 eucalyptus eucalyptus 1679 2010-09-01 12:56 node-pk.pem
drwxr-xr-x 2 eucalyptus eucalyptus 4096 2010-09-01 12:56 UEC-TEST1
-rw-r--r-- 1 eucalyptus eucalyptus 512 2010-09-01 12:56 vtunpass

keys/UEC-TEST1:
total 24
-rw-r--r-- 1 eucalyptus eucalyptus 1147 2010-09-01 12:56 cloud-cert.pem
-rw-r--r-- 1 eucalyptus eucalyptus 1151 2010-09-01 12:56 cluster-cert.pem
-rw-r--r-- 1 eucalyptus eucalyptus 1675 2010-09-01 12:56 cluster-pk.pem
-rw-r--r-- 1 eucalyptus eucalyptus 1151 2010-09-01 12:56 node-cert.pem
-rw-r--r-- 1 eucalyptus eucalyptus 1675 2010-09-01 12:56 node-pk.pem
-rw-r--r-- 1 eucalyptus eucalyptus 512 2010-09-01 12:56 vtunpass
ubuntu@cempedak:/var/lib/eucalyptus$ sudo -i
root@cempedak:~# cd /var/lib/eucalyptus/keys/
root@cempedak:/var/lib/eucalyptus/keys# ls
cloud-cert.pem cluster-cert.pem euca.p12 node-cert.pem UEC-TEST1
cloud-pk.pem cluster-pk.pem nc-client-policy.xml node-pk.pem vtunpass
root@cempedak:/var/lib/eucalyptus/keys# md5sum cluster-*.pem
870e4f0faea7726121f3976101277e87 cluster-cert.pem
265099e170c1d536b2438269bb4250ac cluster-pk.pem
root@cempedak:/var/lib/eucalyptus/keys# md5sum UEC-TEST1/cluster-*.pem
dbd73366f8ec8d740f59caed6317ec96 UEC-TEST1/cluster-cert.pem
0bd80b3cc402f85d69764072180d92b4 UEC-TEST1/cluster-pk.pem
root@cempedak:/var/lib/eucalyptus/keys#

Revision history for this message
C de-Avillez (hggdh2) wrote :

After copying the keys as in comment #9, I restarted Euca -- all is fine.

I then rebooted, and the problem is back.

Revision history for this message
Dmitrii Zagorodnov (dmitrii) wrote :

Carlos - thanks for looking into that! Cluster keys are only generated when --register-cluster is invoked. Trying to register an already registered cluster is supposed to be harmless, but turns out that it is not. (I just filed https://bugs.launchpad.net/eucalyptus/+bug/628328 for that.) I wonder if, upon reboot, UEC registration logic tries to re-register the cluster? If so, that may be the best solution. In the meantime, we'll look into why registration is not idempotent on our side.

Revision history for this message
Dmitrii Zagorodnov (dmitrii) wrote :

I meant: "If so, avoiding the redundant registrations may be the best solution."

Changed in eucalyptus (Ubuntu Maverick):
milestone: none → ubuntu-10.10
Revision history for this message
Thierry Carrez (ttx) wrote :

Theoretically the autoregistration system checks (using euca_conf --list-clusters) if the cluster has already been registered, so it shouldn't retry. Maybe bug 628025 causes duplicate tries, though ?

Thierry Carrez (ttx)
tags: added: server-mrs
Changed in eucalyptus (Ubuntu Maverick):
assignee: nobody → Dave Walker (davewalker)
Revision history for this message
Dave Walker (davewalker) wrote :

Marking fix released, as Dimitri has confirmed this has landed in there branch.

Changed in eucalyptus:
status: New → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote :

I think the correct status in eucalyptus upstream should be "fix committed" since they didn't release with it (I think) yet.

Changed in eucalyptus:
status: Fix Released → Fix Committed
Revision history for this message
Dave Walker (davewalker) wrote :

This is believed to have been fixed with bug #628328 (eucalyptus 2.0+bzr1239-0ubuntu1), if following verification it isn't - please re-open this bug.

Changed in eucalyptus (Ubuntu Maverick):
status: Confirmed → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote :

So this is still happening as of todays ISO. If for some reason euca_conf doesn't say the cluster is registered, re-registering it might apparently still duplicates keys. I did the following: install UEC topology 1, upgrade to latest version, and I got a certificate mismatch again. Downgrading to medium as it seems to be rather rare.

Changed in eucalyptus (Ubuntu Maverick):
milestone: ubuntu-10.10 → none
importance: High → Medium
status: Fix Released → Confirmed
Revision history for this message
Thierry Carrez (ttx) wrote :

Dave reproduced it. Seems like running two concurrent registration process (one for ipv4 and one for ipv6) can have bad side-effects. Bug 628025 will be marked as duplicate, since it's a symptom of the same problem.

Changed in eucalyptus (Ubuntu Maverick):
importance: Medium → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 2.0+bzr1241-0ubuntu2

---------------
eucalyptus (2.0+bzr1241-0ubuntu2) maverick; urgency=low

  * debian/registration/uec_component_listener.c: Ignore IPv6 avahi broadcasts.
    Both IPv4 and IPv6 broadcasts are dispatched from the components. A
    registration attempt was therefore made for each. As IPv6 isn't supported
    the processing when caught is now ignored. (LP: #627963, #628025)
 -- Dave Walker (Daviey) <email address hidden> Mon, 20 Sep 2010 16:57:24 +0100

Changed in eucalyptus (Ubuntu Maverick):
status: Confirmed → Fix Released
Revision history for this message
rowez (info-rowez) wrote :

Got the problem as described above on a Intel Xeon. Did the workaround #9 and --register-cluster without stop/start eucalyptus!

Using eucalyptus (2.0+bzr1241-0ubuntu4.1) on Maverick Meerkat (10.10).

After the reboot, again --deregister-cluster and then --register-cluster. No error's!

But it is reproducible when:

sudo euca_conf --list-nodes
echo's registered node
sudo euca_conf --deregister-nodes 10.0.0.2
sudo euca_conf --list-nodes
echo's registered node
sudo euca_conf --deregister-cluster cluster2011mrt
sudo euca_conf --register-cluster cluster2011mrt 10.0.0.2
sudo euca_conf --list-nodes
echo's nothing
sudo euca_conf --register-nodes 10.0.0.2
sudo euca_conf --list-nodes
echo's nothing
sudo euca_conf --deregister-cluster cluster2011mrt
sudo euca_conf --register-cluster cluster2011mrt 10.0.0.2
Needs time to do something! Then:
echo's ERROR: failed to register new cluster, please log in to the admin interface and check cloud status.
Few second's later, try again without workaround #9:
sudo euca_conf --register-cluster cluster2011mrt 10.0.0.2
echo's SUCCESS: new cluster ........

It looks it is time related on a old Intel XEON!

Revision history for this message
graziano obertelli (graziano.obertelli) wrote :

Marking this bug as fix released since the original bug https://bugs.launchpad.net/bugs/628328 has been fixed.

@rowez: if you still see the issue please re-open.

Changed in eucalyptus:
status: Fix Committed → Fix Released
Revision history for this message
graziano obertelli (graziano.obertelli) wrote :

@rowez: apologies, I meant, please file a new bug since this one is considered released now.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.