Cloud images on kvm use wrong disks

Bug #900799 reported by Alex Bligh
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-initramfs-tools (Ubuntu)
Won't Fix
Low
Unassigned

Bug Description

When a cloud image is launched in a kvm based cloud, two disk references to the same disk appear (/dev/sda and /dev/vda). The former is emulated (slow) and the latter is paravirtualised (fast). Both appear because (a) cloud providers cannot know a priori whether the kernel on the disk image (which is an opaque blob to them) has support for the PV devices, and thus needs to supply emulated devices, and (b) because even where the kernel does support PV devices, the boot loader may not, instead using BIOS calls which only work for emulated devices in many case.

When the cloud images boot up, they have a kernel command line ROOT=LABEL=foobar, and similarly mount by LABEL in fstab. The problem here is that there are 2 disks with the same label, and unfortunately the kernel seems to scan them in module initialisation order (at least I think that's what's happening). As sd_mod is built in, this causes /dev/sda to be selected in preference to /dev/vda.

Possible solutions here would be:

1. To emulate the Xen4 unplug school of thought, where the (virtual) hardware for the PCI device is unplugged early on boot, in a similar manner to Xen4- this would require extensive hypervisor changes, and may break existing images which are (stupidly) configured to run with /dev/sda as root even when /dev/vda is available.

2. To implement an ordering of preferred boot devices, and always prefer /dev/xvda and /dev/vda over /dev/sda.

3. To compile sd_mod in a modular manner, and blacklist this on the kernel command line. The disadvantage here is that for (e.g.) Xen3 hypervisors, this blacklisting would have to be removed.

4. To add some code to the beginning of the sd module source (but leave it built in) so that if a given kernel option is given (e.g. sd_kvm_disable=1), then the sd module would not init (and add relevant devices) if it was running on a kvm virtual machine. cloud images could then have the appropriate additional line in grub.

I would suggest (4) is the simplest here, but others may have better ideas.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-initramfs-tools (Ubuntu):
status: New → Confirmed
Revision history for this message
Stefan Bader (smb) wrote :

Just quickly thinking it over (meaning I likely need more thoughts here):

1. unlikely because I don't believe the kvm approach/model really aims to support this. It will be complex and would deviate too much from upstream.

2. maybe, could be bigger than initially obvious. Not only would have to cover the selection of root but also anything of mountall.

3. rather not as the sd driver is used for most bare metal disks and is for that reason built-in. And installs use the generic images.

4. maybe, but rather want to avoid special code in the kernel and it requires special images which again require to know what the guest will provide.

Revision history for this message
Alex Bligh (ubuntu-alex-org) wrote :

Personal thoughts:

1. Agree!

2. Agree with first sentence. Re second sentence, I /think/ when I last went through this it just uses /dev/.../by-uuid/... or whatever, so provided udev is right, the mountall stuff should be right.

3. Agree.

4. Not sure I understand "and it requires special images which again require to know what the guest will provide."; this bug was originally filed for cloud-images, and they surely know what they will provide, i.e. they are setting the kernel init line to match the kernel. That said, it is still a bit yucky.

As you can probably tell, I haven't found any spectacularly neat ideas myself. If (2) turns out to be easy (I know too little about udev to know), I think that's the best one we have so far, else (4).

Revision history for this message
Stefan Bader (smb) wrote :

A shame, I got side tracked by other issues and have not had time to spend on an idea I had.

2. One thing to note there is that device detection is quite dynamic. Whatever disk with the right uuid or label is found first will likely be the one appearing in the by-uuid or by-label dirs. And I would guess it is not guaranteed to be the same.

4. It is maybe not doable in any way that would allow the same image transparently work in any configuration. Especially with the sort of setup that exposes the same storage area through two interfaces. And as you say its is a special installation image anyway. I think the issue I have with the extra kernel argument would be that this would be again different than upstream which I rather would want to avoid.

Anyway, there was one thing that I had in my mind and wanted to try, but then did not have the time to follow-up. I was wondering whether for an image targeted for that use, it would be worth trying to come up with a special multipath setup. In theory it should be possible to do a priority callout that gives pv disk a higher priority than emulated ones. This should arrange the multipath setup to create two groups and make the pv one the active. One could use the device-mapper paths of devices instead of uuid and theoretically it should work to have the emulated disk used (if that comes up first) and then transparently switch over to use the pv disk. The only pain with that is changing partitions. One has (or had) to be careful about the partition volumes that kpartx creates. I would try to play around with that, but I am not sure when I get to it. So should someone else have time I'd be glad if I would get feedback here.

Scott Moser (smoser)
Changed in cloud-initramfs-tools (Ubuntu):
importance: Undecided → Low
Revision history for this message
Scott Moser (smoser) wrote :

I'm going to mark this 'wont fix' please feel free to argue if its still valid and set it back to Confirmed, but please make sure it is still relevant on 16.04 and recent xen configurations.

Changed in cloud-initramfs-tools (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.