natty kernel does not boot on ec2 t1.micro

Bug #686692 reported by Scott Moser
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Stefan Bader
Natty
Fix Released
High
Stefan Bader

Bug Description

This bug has been split off of bug 669496.
instances of size t1.micro on EC2 do not boot with the natty kernel.

This is true both of i386 and amd64.

I've just tested with instances of:
us-east-1 ami-dece38b7 ebs/ubuntu-natty-daily-i386-server-20101207
us-east-1 ami-d4ce38bd ebs/ubuntu-natty-daily-amd64-server-20101207

There is no console output past the grub messages.

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: linux-image-2.6.37-8-virtual 2.6.37-8.21
Regression: Yes
Reproducible: Yes
ProcVersionSignature: User Name 2.6.37-8.21-virtual 2.6.37-rc4
Uname: Linux 2.6.37-8-virtual x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
CurrentDmesg:

Date: Tue Dec 7 18:27:29 2010
Ec2AMI: ami-d4ce38bd
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1c
Ec2InstanceType: m1.large
Ec2Kernel: aki-427d952b
Ec2Ramdisk: unavailable
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: root=LABEL=uec-rootfs ro console=hvc0
ProcModules: acpiphp 19089 0 - Live 0xffffffffa0000000
SourcePackage: linux

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :
Changed in linux (Ubuntu):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
importance: Undecided → High
milestone: none → natty-alpha-2
status: New → Confirmed
tags: added: kernel-series-unknown
Revision history for this message
Stefan Bader (smb) wrote :

Not the solution yet, unfortunately, but looking at bug #667796, we found that XEN_MAX_DOMAIN_MEMORY limits the memory a domU is reporting. Looking at Natty, this has actually changed to a fixed config option of 128GB. But this went with a quite big change to the mmu code and only changing the value back to 70 is not enough to make it work again. But at least the following commit may be a start to look at:

commit 58e05027b530ff081ecea68e38de8d59db8f87e0
Author: Jeremy Fitzhardinge <email address hidden>
Date: Fri Aug 27 13:28:48 2010 -0700

    xen: convert p2m to a 3 level tree

    Make the p2m structure a 3 level tree which covers the full possible
    physical space.

    The p2m structure contains mappings from the domain's pfns to system-wide
    mfns. The structure has 3 levels and two roots. The first root is for
    the domain's own use, and is linked with virtual addresses. The second
    is all mfn references, and is used by Xen on save/restore to allow it to
    update the p2m mapping for the domain.

    At boot, the domain builder provides a simple flat p2m array for all the
    initially present pages. We construct the two levels above that using
    the early_brk allocator. After early boot time, set_phys_to_machine()
    will allocate any missing levels using the normal kernel allocator
    (at GFP_KERNEL, so it must be called in a normal blocking context).

    Because the early_brk() API requires us to pre-reserve the maximum amount
    of memory we could allocate, there is still a CONFIG_XEN_MAX_DOMAIN_MEMORY
    config option, but its only negative side-effect is to increase the
    kernel's apparent bss size. However, since all unused brk memory is
    returned to the heap, there's no real downside to making it large.

tags: added: regression-release
removed: regression-update
Revision history for this message
Stefan Bader (smb) wrote :

Some updates here: the good news is that I am able to reproduce this on a local CentOS based installation. Bad news so far is that the DomU crashes so quickly that I get no output at all, even when directly attaching to the console on "xm create".

But at least I found a lead. The crashes happen if the guest memory is less than 1G and not dividable by 4. So 615M crashes, but 616 will boot (or 612 and so on). There is also a visible change in the memory layout presented to Linux. While previously the max_pfn was directly used to create an e820 map, there is now some additional 8M added in the data returned by the memory hypercall. I cannot say right now whether that directly relates to the crash or not but one can see that starting a guest with mem=616, Linux will report 624M of memory. There is a lot of shifting around and recalculating going on which I have yet to understand.

Revision history for this message
Scott Moser (smoser) wrote :

@Stefan,
  just for reference, could you attach your xen config for this instance ? I'd like to recreate.

Revision history for this message
Stefan Bader (smb) wrote :

name = "NattyServerMicro32"
kernel = "/root/boot/pv-grub-hd0-V1.01-i386.gz"
memory = 616
vcpus = 1
disk = [ 'file:/root/amis/natty-server-uec-i386.img,sda1,w' ]
vif = [ '' ]

Not sure the vif really would work like this. I seem to have problems getting the boot completed (currently got the cloud-init stuff disabled as I have no magic meta server).

Revision history for this message
Stefan Bader (smb) wrote :
Download full text (3.8 KiB)

One further step finally. Using 'on_crash = "coredump-destroy"' and after creating /var/xen/dump, I was able to extract the following from the dump file:

<6>[ 0.000000] ACPI in unprivileged domain disabled
<3>[ 0.000000] max_pfn used = 26700(26700000)
<3>[ 0.000000] Xen: map base 0 + 26f00000
<3>[ 0.000000] Xen: map end = 26f00000
<3>[ 0.000000] map size reduzed to 26700000
<3>[ 0.000000] delta = 800000, extra_pages = 2048
<3>[ 0.000000] extra_mem_start = 26700000
<3>[ 0.000000] Xen: reserve c166f000c15d2000 - 800
<6>[ 0.000000] released 0 pages of unused memory
<3>[ 0.000000] Xen: extra_limit = 159488
<3>[ 0.000000] Xen: adding 2048 extra pages at 644874240
<6>[ 0.000000] BIOS-provided physical RAM map:
<6>[ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable)
<6>[ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved)
<6>[ 0.000000] Xen: 0000000000100000 - 0000000026f00000 (usable)
<6>[ 0.000000] NX (Execute Disable) protection: active
<6>[ 0.000000] DMI not present or invalid.
<7>[ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable
) ==> (reserved)
<7>[ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
<6>[ 0.000000] last_pfn = 0x26f00 max_arch_pfn = 0x1000000
<6>[ 0.000000] Scanning 0 areas for low memory corruption
<7>[ 0.000000] initial memory mapped : 0 - 01fff000
<6>[ 0.000000] init_memory_mapping: 0000000000000000-0000000026f00000
<7>[ 0.000000] 0000000000 - 0026f00000 page 4k
<7>[ 0.000000] kernel direct mapping tables up to 26f00000 @ 1ec4000-1fff000
<1>[ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null)
<1>[ 0.000000] IP: [<c0107397>] xen_set_pte+0x27/0x60
<4>[ 0.000000] *pdpt = 0000000000000000 *pde = 0000000000000000
<0>[ 0.000000] Oops: 0003 [#1] SMP
<0>[ 0.000000] last sysfs file:
<4>[ 0.000000] Modules linked in:
<4>[ 0.000000]
<4>[ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.37-12-virtual #26+lp686692v3 /
<4>[ 0.000000] EIP: e019:[<c0107397>] EFLAGS: 00010046 CPU: 0
<4>[ 0.000000] EIP is at xen_set_pte+0x27/0x60
<4>[ 0.000000] EAX: 00000000 EBX: c1fe7800 ECX: 00000000 EDX: c0848000
<4>[ 0.000000] ESI: 00000003 EDI: 00000000 EBP: c0849e14 ESP: c0849e04
<4>[ 0.000000] DS: e021 ES: e021 FS: 00d8 GS: 00e0 SS: e021
<0>[ 0.000000] Process swapper (pid: 0, ti=c0848000 task=c084f060 task.ti=c0848000)
<0>[ 0.000000] Stack:
<4>[ 0.000000] c1fe7800 c1fe7800 00000003 00000000 c0849e30 c08aa7ca 00000fff fffff003
<4>[ 0.000000] e6700000 00000000 00026700 c0849e38 c01362be c0849e8c c08b9961 c0849e64
<4>[ 0.000000] 46cf9ef8 00026701 c1fe7800 c0a3f998 00000133 00000100 00026f00 00000000
<0>[ 0.000000] Call Trace:
<4>[ 0.000000] [<c08aa7ca>] ? xen_set_pte_init+0x6b/0x72
<4>[ 0.000000] [<c01362be>] ? set_pte+0xe/0x10
<4>[ 0.000000] [<c08b9961>] ? kernel_physical_mapping_init+0x1c9/0x291
<4>[ 0.000000] [<c06122b6>] ? init_memory_mapping+0x1e6/0x340
<4>[ 0.000000] [<c08ac037>] ? setup_arch+0x6ce/0x935
<4>[ 0.000000] [<c010798e>] ? __raw_callee_save_xen_restore_fl+0x6/0x8
<4>[ 0.000000] [...

Read more...

Revision history for this message
Stefan Bader (smb) wrote :

I think I see the issue now. When xen sets up the p2m tree, it does a loop from 0 to max_pfn-1, incrementing by the number of p2m mappings in the leaf. If max_pfn is a multiple of 4M this works out. But if not, we need an additional leaf being initialized (which is only partially used).

I need to think about how to make this work best. Maybe the end_pfn needs to be rounded up to the next multiple of P2M_PER_PAGE. And the next question would be how many places need to be touched as there is at least another place which sets up the corresponding pfn to mfn mapping...

Revision history for this message
Stefan Bader (smb) wrote :

<3>[ 0.000000] smb: pfn=266ff calling set_pte(c1fe77f8, 6b3003)
<3>[ 0.000000] smb: pfn=26700 calling set_pte(c1fe7800, 3)
<1>[ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null)

This was seen with some annotation. Basically pfn_pte for the last pfn returns an invalid pte.

Revision history for this message
Stefan Bader (smb) wrote :

Ok, so it was the right place but a completely wrong explanation. The problem is not that the last part of pointers is missed but that it is not. The problem is that the kernel is given a flat array of address pointers by the domain constructor along with the number of pointer in that array. With recent changes, the Xen kernel code tries to map this into a 3-level tree structure, where the leaves contain a part of that array. To conserve memory, the 2nd level points directly at parts of the flat array, which is ok as long as the whole 4k area is containing valid pointers. But for memory assignments which are not a multiple of 4MB (or 2MB for 64bit) the last leaf would contain some undefined pointers instead of invalid markers.

The attached patch assumes that it is not good to meddle with the memory at the end of the external array, so if there is a final leaf that would only be partially filled, it allocates a new page, initializes it and then copies the valid pointers from the original array.

tags: added: patch
Revision history for this message
Stefan Bader (smb) wrote :

With that patch applied I was able to successfully boot t1.micro instances with a 2.6.37 kernel:

ubuntu@ip-10-112-5-120:~$ uname -a
Linux ip-10-112-5-120 2.6.37-12-virtual #26+686692v2 SMP Thu Jan 20 11:30:38 UTC 2011 x86_64 GNU/Linux
ubuntu@ip-10-112-5-120:~$ echo $(wget -q -O- http://169.254.169.254/latest/meta-data/instance-type)
t1.micro
ubuntu@ip-10-112-5-120:~$ uname -m
x86_64

ubuntu@ip-10-117-61-4:~$ uname -a
Linux ip-10-117-61-4 2.6.37-12-virtual #26+686692v2 SMP Thu Jan 20 11:33:17 UTC 2011 i686 GNU/Linux
ubuntu@ip-10-117-61-4:~$ echo $(wget -q -O- http://169.254.169.254/latest/meta-data/instance-type)
t1.micro
ubuntu@ip-10-117-61-4:~$ uname -m
i686

Next step will be to send this upstream to see whether it is an acceptable approach or not.

Changed in linux (Ubuntu Natty):
status: Confirmed → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Natty):
status: In Progress → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

I'm still unable to boot i386 instances. I tested
  us-east-1 ami-5c3fcf35 canonical ebs/ubuntu-natty-daily-i386-server-20110131
It resulted in no console output and unreachable instance in t1.micro.

So, i386 is still broken on t1.micro (the same ami does boot on m1.small).

However, x86_64 is functional. I just verified
us-east-1 ami-2e3fcf47 canonical ebs/ubuntu-natty-daily-amd64-server-20110131

$ uname -r
2.6.38-1-virtual
$ uname -m
x86_64
$ ec2metadata --instance-type
t1.micro
$ dpkg -S /boot/vmlinuz-$(uname -r)
linux-image-2.6.38-1-virtual: /boot/vmlinuz-2.6.38-1-virtual

Martin Pitt (pitti)
Changed in linux (Ubuntu Natty):
milestone: natty-alpha-2 → natty-alpha-3
Revision history for this message
Scott Moser (smoser) wrote :

This was fix-released by Stefan in 2.6.38-1.28. Alpha2 boots in amd64 in t1.micro. We've opened bug 710754 to address the i386 issue.

Changed in linux (Ubuntu Natty):
status: Fix Committed → Fix Released
Revision history for this message
Andy Whitcroft (apw) wrote :
Download full text (3.2 KiB)

This bug was fixed in the package linux - 2.6.38-1.27

---------------
linux (2.6.38-1.27) natty; urgency=low

  [ Andy Whitcroft ]

  * ubuntu: AUFS -- update aufs-update to track new locations of headers
  * ubuntu: AUFS -- update to c5021514085a5d96364e096dbd34cadb2251abfd
  * SAUCE: ensure root is ready before running usermodehelpers in it
  * correct the Vcs linkage to point to natty
  * rebase to linux tip e78bf5e6cbe837daa6ab628a5f679548742994d3
  * [Config] update configs following rebase
    e78bf5e6cbe837daa6ab628a5f679548742994d3
  * SAUCE: Yama: follow changes to generic_permission
  * ubuntu: compcache -- follow changes to bd_claim/bd_release
  * ubuntu: iscsitarget -- follow changes to open_bdev_exclusive
  * ubuntu: ndiswrapper -- fix interaction between __packed and packed
  * ubuntu: AUFS -- update to 806051bcbeec27748aae2b7957726a4e63ff308e
  * update package version to match payload version
  * rebase to e6f597a1425b5af64917be3448b29e2d5a585ac8
  * rebase to v2.6.38-rc1
  * [Config] updateconfigs following rebase to v2.6.38-rc1
  * SAUCE: x86 fix up jiffies/jiffies_64 handling
  * rebase to linus tip 2b1caf6ed7b888c95a1909d343799672731651a5
  * [Config] updateconfigs following rebase to
    2b1caf6ed7b888c95a1909d343799672731651a5
  * [Config] disable CONFIG_TRANSPARENT_HUGEPAGE to fix i386 boot crashes
  * ubuntu: AUFS -- suppress benign plink warning messages
    - LP: #621195
  * [Config] CONFIG_NR_CPUS=256 for amd64 -server flavour
  * rebase to v2.6.38-rc2
  * rebase to mainline d315777b32a4696feb86f2a0c9e9f39c94683649
  * rebase to c723fdab8aa728dc2bf0da6a0de8bb9c3f588d84
  * [Config] update configs following rebase to
    c723fdab8aa728dc2bf0da6a0de8bb9c3f588d84
  * [Config] disable CONFIG_AD7152 to fix FTBS on armel versatile
  * [Config] disable CONFIG_AD7150 to fix FTBS on armel versatile
  * [Config] disable CONFIG_RTL8192CE to fix FTBS on armel omap
  * [Config] disable CONFIG_MANTIS_CORE to fix FTBS on armel versatile

  [ Kees Cook ]

  * SAUCE: kernel: make /proc/kallsyms mode 400 to reduce ease of attacking

  [ Stefan Bader ]

  * Temporarily disable RODATA for virtual i386
    - LP: #699828

  [ Tim Gardner ]

  * [Config] CONFIG_NLS_DEFAULT=utf8
    - LP: #683690
  * [Config] CONFIG_HIBERNATION=n
  * update bnx2 firmware files in d-i/firmware/nic-modules

  [ Upstream Kernel Changes ]

  * Revert "drm/radeon/bo: add some fallback placements for VRAM only
    objects."
  * packaging: make System.map mode 0600
  * thinkpad_acpi: Always report scancodes for hotkeys
    - LP: #702407
  * sched: tg->se->load should be initialised to tg->shares
  * Input: sysrq -- ensure sysrq_enabled and __sysrq_enabled are consistent
  * brcm80211: include linux/slab.h for kfree
  * pch_dma: add include/slab.h for kfree
  * i2c-eg20t: include linux/slab.h for kfree
  * gpio/ml_ioh_gpio: include linux/slab.h for kfree
  * tty: include linux/slab.h for kfree
  * winbond: include linux/delay.h for mdelay et al

  [ Upstream Kernel Changes ]

  * mark the start of v2.6.38 versioning
  * rebase v2.6.37 to v2.6.38-rc2 + c723fdab8aa728dc2bf0da6a0de8bb9c3f588d84
    - LP: #689886
    - LP: #702125
    - LP: #608775
    - LP: #215802
...

Read more...

Revision history for this message
Matt Wilson (msw-amazon) wrote :

The permanent fix for this is likely in PV-GRUB. See: https://patchwork.kernel.org/patch/727511/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.