xen_emul_unplug=unnecessary on kernel cmdline is required in ec2 hvm

Bug #704022 reported by Scott Moser
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Medium
Unassigned

Bug Description

Under bug 684875 I added a cmdline option to the EC2 HVM instances [1]:
   xen_emul_unplug=unnecesarry

This flag is required to get the instance to boot on EC2 HVM instances. With bug 684875 fixed, we expected that this would no longer be necessary. However, I've tested today and it is still necessary with
$ uname -r
2.6.37-12-virtual
$ dpkg -S /boot/vmlinuz-$(uname -r)
linux-image-2.6.37-12-virtual: /boot/vmlinuz-2.6.37-12-virtual

I'll attach console logs.

--
[1] http://bazaar.launchpad.net/~ubuntu-on-ec2/ubuntu-on-ec2/ec2-publishing-scripts/revision/258#publish-build-ebs

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: linux-image-2.6.37-12-virtual 2.6.37-12.26
Regression: No
Reproducible: Yes
ProcVersionSignature: User Name 2.6.37-12.26-virtual 2.6.37
Uname: Linux 2.6.37-12-virtual x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
CRDA: Error: [Errno 2] No such file or directory
CurrentDmesg: [ 20.210048] eth0: no IPv6 routers present
Date: Mon Jan 17 15:42:24 2011
Ec2AMI: ami-f442b39d
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1d
Ec2InstanceType: cc1.4xlarge
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Xen HVM domU
PciMultimedia:

ProcEnviron:
 LANG=en_US.UTF-8
 LC_MESSAGES=en_US.utf8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.37-12-virtual root=UUID=f7ac8333-d0de-4c19-b130-794dbd905a44 ro vt.handoff=7 console=ttyS0 xen_emul_unplug=unnecessary
RelatedPackageVersions:
 linux-restricted-modules-2.6.37-12-virtual N/A
 linux-backports-modules-2.6.37-12-virtual N/A
 linux-firmware 1.45
RfKill:

SourcePackage: linux
dmi.bios.date: 04/14/2010
dmi.bios.vendor: Xen
dmi.bios.version: 3.4.2
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:bvr3.4.2:bd04/14/2010:svnXen:pnHVMdomU:pvr3.4.2:cvnXen:ct1:cvr:
dmi.product.name: HVM domU
dmi.product.version: 3.4.2
dmi.sys.vendor: Xen

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

I modified grub to have an additional boot option called 'CUSTOM-TEST' and booted into that.

$ ent="Ubuntu, with Linux $(uname -r)"
$ cust="CUSTOM-TEST"
$ sed -n -e "s/${ent}/${cust}/" \
  -e "/^menuentry '${cust}'/,/}/p" \
  < /boot/grub/grub.cfg | sudo tee /boot/grub/custom.cfg
$ sudo sed -i 's,xen_emul_unplug=unnecessary,,' /boot/grub/custom.cfg
$ sudo grub-set-default "${ent}"
$ sudo grub-reboot "${cust}"
$ sudo update-grub
$ grep -v "^#" /boot/grub/grubenv
saved_entry=CUSTOM-TEST
prev_saved_entry=Ubuntu, with Linux 2.6.37-12-virtual
$ cat /boot/grub/custom.cfg
menuentry 'CUSTOM-TEST' --class ubuntu --class gnu-linux --class gnu --class os {
 recordfail
 set gfxpayload=$linux_gfx_mode
 insmod part_msdos
 insmod ext2
 set root='(hd2,msdos1)'
 search --no-floppy --fs-uuid --set=root f7ac8333-d0de-4c19-b130-794dbd905a44
 linux /boot/vmlinuz-2.6.37-12-virtual root=UUID=f7ac8333-d0de-4c19-b130-794dbd905a44 ro vt.handoff=7 console=ttyS0
 initrd /boot/initrd.img-2.6.37-12-virtual
}
$ sudo reboot

The above does a "boot once" into that custom test kernel, so that subsequent reboots will go back into working kernel.

Revision history for this message
Scott Moser (smoser) wrote :
Scott Moser (smoser)
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Robert J Berger (rberger-runa) wrote :

We have been having ongoing issues with Ubuntu 11.04 HVM instances on AWS EC2 having ATA errors causing long timeouts. AWS engineering is now saying its due / related to this bug. Seems hard to believe, but I thought you might be interested.

See: https://forums.aws.amazon.com/message.jspa?messageID=311995#311995

Is this true? If so is there a Canonical Ubuntu HVM image that doesn't have this problem yet?

Revision history for this message
Stefan Bader (smb) wrote :

Whether using the pv devices or the emulated ones should hardy make a difference with respect of the timeouts and data loss which is reported here. It has a performance benefit for the communication between the instance and the Xen host, though. And we are about to release kernel updates to have the required drivers built-in and not as modules (like they are now). But those are currently in testing (for 11.04 and 11.10).

For the timeouts, I wonder whether that could be an effect of storage on ec2 usually being network attached, together with a high(er) general utilization. Actually moving from emulated to pv devices could make that even worse. Surely there also is a chance that the pv driver handles such delays/contention better than the emulated device driver.

But it should be possible to switch to the pv drivers even with the current images. Just add the pci-platform, xen-netfront and xen-blkfront drivers into /etc/initramfs-tools/modules, then run update-initramfs -u and make sure that the root device in /boot/grub/grub.cfg and /etc/fstab is using uuid or labels (because on reboot with pv drivers enabled device names will switch from hd to xvd for the block devices).

Revision history for this message
Matt Wilson (msw-amazon) wrote :

Stefan,

The ec2 kernels already have xen-netfront and xen-blkfront compiled in. If xen-platform-pci was also compiled in, or included in the initramfs, then the HW emulation will be unplugged properly and you'll switch over to the PV drivers. The following results in PV drivers for the root volume on my test instance:

add xen-platform-pci to /etc/initramfs-tools/modules
run update-initramfs -u
remove xen_emu_unplug=unnecessary from the kernel boot command line
reboot

See also bug 804219

Revision history for this message
Robert J Berger (rberger-runa) wrote :

If what Matt says is true, shouldn't this be incorporated into the next release of the Ubuntu HVM images? I think people only notice this if they are running applications that get upset when there is a long pause like HBase/Zookeeper. In our case it notices that there is a long pause and considers it a timeout and shuts itself down. But we've also seen spontaneous reboots on some of these.

We've had one other weird issue that I can not be sure is related or not, where even when we do Unix "sync" and then do EBS snapshots and the snapshot is missing large amounts of the data from the server. All the files and directories are there but many files are empty. We've seen this for the HDFS data blocks on the xfs filesystem and ami-f1589598.

The only way we could get a clean EBS snapshot was by detaching the EBS volume.

I would think this would impact at least anyone who is running HBase and it would impact the performance of anyone using the Ubunutu images on Cluster instances. And for some cases, it could lead to data corruption if it caused an HDFS namenode to crash or even just shutdown.

In the mean time we are working on trying out Matt's suggestion on one of our machines. Its non-trivial to do on a running cluster.

Thanks

Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.