20100222 images fail to boot in UEC (HTTP error 500 retrieving ephemeral0 metadata)

Bug #525675 reported by Thierry Carrez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Eucalyptus
Fix Released
Undecided
chris grzegorczyk
cloud-init (Ubuntu)
Invalid
Low
Scott Moser
Lucid
Invalid
Low
Scott Moser
eucalyptus (Ubuntu)
Fix Released
High
Dustin Kirkland 
Lucid
Fix Released
High
Dustin Kirkland 
python-boto (Ubuntu)
Invalid
Wishlist
Unassigned
Lucid
Won't Fix
Wishlist
Unassigned

Bug Description

Binary package hint: cloud-init

Starting a 20100222 lucid cloud image on UEC, the instance boots, IP is up, but SSH is never started.
Got the following errors in euca-get-console-output:

FATAL: Could not load /lib/modules/2.6.32-14-server/modules.dep: No such file or directory
[ 4.525542] kjournald starting. Commit interval 5 seconds
[ 4.527080] EXT3-fs: mounted filesystem with ordered data mode.
Begin: Running /scripts/local-bottom ...
Done.
Done.
Begin: Running /scripts/init-bottom ...
Begin: Starting AppArmor profiles ...
chroot: cannot execute /etc/apparmor/initramfs: No such file or directory
Failure: AppArmor profiles failed to load
Done.
Caught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance data

Might be a problem in eucalyptus rather than in the cloud image (metadata service not responding ?)

Related branches

Revision history for this message
Thierry Carrez (ttx) wrote :

Reinstalled my UEC setup to todays archive:
- Karmic image works alright
- Lucid image still fails

So I suppose it's an issue in the image rather than in Eucalyptus.

Changed in cloud-init (Ubuntu):
assignee: nobody → Scott Moser (smoser)
importance: Undecided → High
milestone: none → lucid-alpha-3
status: New → Confirmed
Revision history for this message
Scott Moser (smoser) wrote :

It would seem that your metadata service is not up for some reason.

However, one thing to note here is that you seem to be trying to boot with an initramfs, but there is no initramfs in this build.

Revision history for this message
Thierry Carrez (ttx) wrote :

About the metadata service:
The metadata service seems to work well for the karmic cloud image ?

About the boot with initramfs:
The image was registered through uec-register-tarball, is there anything specific to do to avoid booting with an initramfs ?

Revision history for this message
Scott Moser (smoser) wrote :

I'm fairly sure I know what is wrong, or at least suspect something.

cloudinit/DataSourceEc2.py:
class DataSourceEc2(DataSource.DataSource):
    api_ver = '2009-04-04'

Where, in ec2init (in karmic):
./ec2init/__init__.py:
class EC2Init:
    api_ver = '2008-02-01'

The result of above is that lucid request the newer version of the data service, which I don't think eucalyptus is making available.

You can verifiy this in your karmic instance with:

python -c 'import boto.utils; boto_utils.get_instance_userdata("2009-02-01")'

I think that will work, but if you use the newer version (2009-04-04) i think you'll fail.

Revision history for this message
Thierry Carrez (ttx) wrote :

In karmic instance:
boto.utils.get_instance_userdata("2009-02-01") --> returns ''
boto.utils.get_instance_userdata("2009-04-04") --> returns ''
boto.utils.get_instance_metadata("2009-02-01") --> returns a full dict
boto.utils.get_instance_metadata("2009-04-04") --> returns the same full dict

In karmic instance with boto 1.9:
boto.utils.get_instance_userdata("2009-02-01") --> returns ''
boto.utils.get_instance_userdata("2009-04-04") --> returns ''
boto.utils.get_instance_metadata("2009-02-01") --> hangs
boto.utils.get_instance_metadata("2009-04-04") --> hangs

I traced it back to the metadata enumeration. With boto 1.9b:

http://169.254.169.254/latest/meta-data/block-device-mapping/ returns:
'emi\nephemeral0\nroot\nswap'
http://169.254.169.254/latest/meta-data/block-device-mapping/emi returns:
sda1
http://169.254.169.254/latest/meta-data/block-device-mapping/ephemeral0 returns error code 500.

So retry_url loops while trying to get http://169.254.169.254/latest/meta-data/block-device-mapping/ephemeral0

summary: - 20100222 images fail to boot in UEC
+ 20100222 images fail to boot in UEC (no ephemeral0 metadata)
Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu Lucid):
assignee: nobody → Dustin Kirkland (kirkland)
importance: Undecided → High
milestone: none → lucid-alpha-3
status: New → Confirmed
Revision history for this message
Thierry Carrez (ttx) wrote : Re: 20100222 images fail to boot in UEC (no ephemeral0 metadata)
Revision history for this message
Thierry Carrez (ttx) wrote :

Right, boto 1.9 implemented recursive metadata retrieval in get_instance_metadata (in boto/utils.py). That makes it fail if Eucalyptus exposes a key (ephemeral0) that it doesn't support querying to (error 500).

summary: - 20100222 images fail to boot in UEC (no ephemeral0 metadata)
+ 20100222 images fail to boot in UEC (HTTP error 500 retrieving
+ ephemeral0 metadata)
Changed in cloud-init (Ubuntu Lucid):
importance: High → Low
milestone: lucid-alpha-3 → none
status: Confirmed → Invalid
Revision history for this message
Scott Moser (smoser) wrote : Re: [Bug 525675] Re: 20100222 images fail to boot in UEC

On Mon, 22 Feb 2010, Thierry Carrez wrote:

> The image was registered through uec-register-tarball, is there anything specific to do to avoid booting with an initramfs ?
>

does euca-describe-images show a eri ?
if so, then uec-register tarbal is buggy,

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Adding a wishlist priority task against boto; it could fail more gracefully with a more informative message.

Changed in python-boto (Ubuntu Lucid):
status: New → Confirmed
importance: Undecided → Wishlist
Revision history for this message
Thierry Carrez (ttx) wrote :

I think I nailed it:
in clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java

    m.put( "block-device-mapping/", "emi\nephemeral0\nroot\nswap" );
    m.put( "block-device-mapping/emi", "sda1" );
    m.put( "block-device-mapping/ami", "sda1" );
    m.put( "block-device-mapping/ephemeral", "sda2" );

That last line should probably read
    m.put( "block-device-mapping/ephemeral0", "sda2" );

Regression introduced in r906:
- m.put( "block-device-mapping/", "emi\nephemeral\nroot\nswap" );
+ m.put( "block-device-mapping/", "emi\nephemeral0\nroot\nswap" );
without changing the ephemeral reference below.

Changed in eucalyptus (Ubuntu Lucid):
status: Confirmed → Triaged
Revision history for this message
Thierry Carrez (ttx) wrote :

@Scott: no eri reference for the one without ramdisk, so it's ok.

Revision history for this message
Thierry Carrez (ttx) wrote :

For reference, regression introduced by an incomplete fix for bug 513842

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 1.6.2-0ubuntu3

---------------
eucalyptus (1.6.2-0ubuntu3) lucid; urgency=low

  [ Thierry Carrez ]
  * clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java:
    fix incomplete ephemeral block device mapping path, LP: #525675
 -- Dustin Kirkland <email address hidden> Mon, 22 Feb 2010 14:09:26 -0600

Changed in eucalyptus (Ubuntu Lucid):
status: Triaged → Fix Released
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Okay, I rolled a package with Thierry's fix. I bundled and uploaded today's Lucid's image, ran that Instance, and I was able to successfully ssh into it.

Note, I did file a couple of new bugs, notably that I had to manually register the image, rather than using uec-publish*
 * Bug #525989

And there's still a few errors in console-out:
 * Bug #525994

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Dan/upstream Eucalyptus-

I have a minor fix for this bug that I'd like for you to review and merge into your 1.6.2 branch.

See:
  lp:~kirkland/eucalyptus/525675

Revision history for this message
chris grzegorczyk (chris-grze) wrote :

Hi Dustin,

This was fixed in the below revno. Closing the bug against Eucalyptus.

------------------------------------------------------------
    revno: 1185.1.1
    committer: decker <decker@hawaii>
    branch nick: metadata
    timestamp: Tue 2010-02-09 17:27:01 -0800
    message:
      ephemeral -> ephemeral0 LP:#513842
------------------------------------------------------------

Changed in eucalyptus:
status: New → Fix Released
assignee: nobody → chris grzegorczyk (chris-grze)
Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [Bug 525675] Re: 20100222 images fail to boot in UEC (HTTP error 500 retrieving ephemeral0 metadata)

Chris-

Almost, but not quite ... Looks to me like this commit fixed one
ephemeral reference, but missed the other... Can you confirm/deny,
Chris?

bzr diff -r 1185..1185.1.1

=== modified file
'clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java'
--- clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java
2010-02-05 12:04:49 +0000
+++ clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java
2010-02-10 01:27:01 +0000
@@ -342,7 +342,7 @@
     m.put( "ramdisk-id", this.getImageInfo( ).getRamdiskId( ) );
     m.put( "security-groups", this.getNetworkNames( ).toString(
).replaceAll( "[\\Q[]\\E]", "" ).replaceAll( ", ", "\n" ) );

- m.put( "block-device-mapping/", "emi\nephemeral\nroot\nswap" );
+ m.put( "block-device-mapping/", "emi\nephemeral0\nroot\nswap" );
     m.put( "block-device-mapping/emi", "sda1" );
     m.put( "block-device-mapping/ami", "sda1" );
     m.put( "block-device-mapping/ephemeral", "sda2" );

Revision history for this message
chris grzegorczyk (chris-grze) wrote :

Indeed. Not sure how that slipped by. Thanks for catching it. See r1200.

On Mon, Feb 22, 2010 at 8:05 PM, Dustin Kirkland
<email address hidden> wrote:
> Chris-
>
> Almost, but not quite ...  Looks to me like this commit fixed one
> ephemeral reference, but missed the other...  Can you confirm/deny,
> Chris?
>
> bzr diff -r 1185..1185.1.1
>
> === modified file
> 'clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java'
> --- clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java
> 2010-02-05 12:04:49 +0000
> +++ clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java
> 2010-02-10 01:27:01 +0000
> @@ -342,7 +342,7 @@
>     m.put( "ramdisk-id", this.getImageInfo( ).getRamdiskId( ) );
>     m.put( "security-groups", this.getNetworkNames( ).toString(
> ).replaceAll( "[\\Q[]\\E]", "" ).replaceAll( ", ", "\n" ) );
>
> -    m.put( "block-device-mapping/", "emi\nephemeral\nroot\nswap" );
> +    m.put( "block-device-mapping/", "emi\nephemeral0\nroot\nswap" );
>     m.put( "block-device-mapping/emi", "sda1" );
>     m.put( "block-device-mapping/ami", "sda1" );
>     m.put( "block-device-mapping/ephemeral", "sda2" );
>
> --
> 20100222 images fail to boot in UEC (HTTP error 500 retrieving ephemeral0 metadata)
> https://bugs.launchpad.net/bugs/525675
> You received this bug notification because you are a bug assignee.
>
> Status in Eucalyptus: Fix Released
> Status in “cloud-init” package in Ubuntu: Invalid
> Status in “eucalyptus” package in Ubuntu: Fix Released
> Status in “python-boto” package in Ubuntu: Confirmed
> Status in “cloud-init” source package in Lucid: Invalid
> Status in “eucalyptus” source package in Lucid: Fix Released
> Status in “python-boto” source package in Lucid: Confirmed
>
> Bug description:
> Binary package hint: cloud-init
>
> Starting a 20100222 lucid cloud image on UEC, the instance boots, IP is up, but SSH is never started.
> Got the following errors in euca-get-console-output:
>
> FATAL: Could not load /lib/modules/2.6.32-14-server/modules.dep: No such file or directory
> [    4.525542] kjournald starting.  Commit interval 5 seconds
> [    4.527080] EXT3-fs: mounted filesystem with ordered data mode.
> Begin: Running /scripts/local-bottom ...
> Done.
> Done.
> Begin: Running /scripts/init-bottom ...
> Begin: Starting AppArmor profiles ...
> chroot: cannot execute /etc/apparmor/initramfs: No such file or directory
> Failure: AppArmor profiles failed to load
> Done.
> Caught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance data
>
> Might be a problem in eucalyptus rather than in the cloud image (metadata service not responding ?)
>
>
>

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Erg, I think I'm hitting this again on an Alpha3 eucalyptus 1.6.2-0ubuntu4 installation.

Funny thing is that I did not hit it in topo1 (CLC+WC+CC+SC, NC), but I did hit it in topo3 (CLC+WC, CC+SC, NC).

I'm still investigating :-/

Revision history for this message
Thierry Carrez (ttx) wrote :

@Dustin: make sure it's the same bug. I'd rather think it's a global issue for the instance to access metadata in that topology, rather than specifically an issue with the "ephemeral0" key, and that would make it a new bug. Checking cloud-error.log should help.

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Upon further inspection, it is definitely a different bug.

I have not actually seen this particular bug resurface. Only similar
symptoms. Thanks.

Thierry Carrez (ttx)
Changed in python-boto (Ubuntu Lucid):
status: Confirmed → Won't Fix
Scott Moser (smoser)
Changed in python-boto (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.