Karmic i386 EC2 kernel emulating unsupported memory accesses

Bug #427288 reported by John Johansen
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
DEPRECATED Pantheon
New
Undecided
Unassigned
VMBuilder
Fix Released
Undecided
Unassigned
eglibc (Ubuntu)
Fix Released
High
Steve Langasek
Karmic
Fix Released
High
Steve Langasek
vm-builder (Ubuntu)
Fix Released
High
Scott Moser
Karmic
Fix Released
High
Scott Moser

Bug Description

When booting the karmic alpha5 test ami-3520c05c, with the Karmic test kernel aki-dc06e6b5 the following warning is issued

[ 1.588049] ***************************************************************
[ 1.588055] ***************************************************************
[ 1.588060] ** WARNING: Currently emulating unsupported memory accesses **
[ 1.588065] ** in /lib/tls glibc libraries. The emulation is **
[ 1.588070] ** slow. To ensure full performance you should **
[ 1.588075] ** install a 'xen-friendly' (nosegneg) version of **
[ 1.588079] ** the library, or disable tls support by executing **
[ 1.588084] ** the following as root: **
[ 1.588088] ** mv /lib/tls /lib/tls.disabled **
[ 1.588093] ** Offending process: init (pid=1007) **
[ 1.588098] ***************************************************************
[ 1.588102] ***************************************************************

Tags: ec2-images
Changed in linux (Ubuntu):
assignee: nobody → John Johansen (jjohansen)
Revision history for this message
John Johansen (jjohansen) wrote :

This appears to have 2 potential fixes.

1. is a kernel patch that disables fixup_4gb_segment in the kernel (attached)
2. is having glibc built with -mno-tls-direct-seg-refs

Some relevant links with more information on this.

http://wiki.xensource.com/xenwiki/XenSpecificGlibc

http://wiki.xensource.com/xenwiki/XenSegments

Scott Moser (smoser)
tags: added: ec2-images
Steve Langasek (vorlon)
Changed in linux (Ubuntu Karmic):
milestone: none → ubuntu-9.10-beta
importance: Undecided → High
Revision history for this message
Scott Moser (smoser) wrote :

From Eric Hammond:
| I ended up with the following code when I build images for EC2;
| not sure if it fixes the bug just mentioned:
| http://paste.ubuntu.com/268982/
|> chroot $imagedir apt-get install -y libc6-xen
|> echo 'hwcap 0 nosegneg' > $imagedir/etc/ld.so.conf.d/libc6-xen.conf
|> chroot $imagedir apt-get remove -y libc6-i686 || true
|> chroot $imagedir ldconfig

Matt Zimmerman (mdz)
Changed in linux (Ubuntu Karmic):
status: New → Triaged
Revision history for this message
Scott Moser (smoser) wrote :

The next thing to be done here is to test if this bug is fixed with libc6-xen (http://packages.ubuntu.com/karmic/libc6-xen).

I will test that.

libc6-xen depends on libc6 and says "will be selected instead when running under Xen. " . Hopefully that means no negative affect if this package is installed but the image is run on kvm (in UEC)

Revision history for this message
Scott Moser (smoser) wrote :

I did a quick test to see if simple installation of libc6-xen would solve
this issue. It appears not. See below for details:

[localsys]$ ec2-run-instances --user-data foo \
   --kernel aki-9c04e4f5 --ramdisk ari-9e04e4f7 ami-3520c05c

# note '--user-data foo' is just to work around bug 419306

[localsys]$ ssh -i ${keypair_key_us} ubuntu@ec2...
$ dmesg | grep ' \*\*'
[ 1.795832] ***************************************************************
[ 1.795836] ***************************************************************
[ 1.795840] ** WARNING: Currently emulating unsupported memory accesses **
[ 1.795844] ** in /lib/tls glibc libraries. The emulation is **
[ 1.795848] ** slow. To ensure full performance you should **
[ 1.795852] ** install a 'xen-friendly' (nosegneg) version of **
[ 1.795855] ** the library, or disable tls support by executing **
[ 1.795859] ** the following as root: **
[ 1.795863] ** mv /lib/tls /lib/tls.disabled **
[ 1.795867] ** Offending process: init (pid=1012) **
[ 1.795871] ***************************************************************
[ 1.795874] ***************************************************************

$ sudo apt-get update
$ sudo apt-get install libc6-xen
...
Get:1 http://us.ec2.archive.ubuntu.com karmic/universe libc6-xen 2.10.1-0ubuntu8 [1253kB]
...
$ sudo reboot

[localsys]$ ssh -i ${keypair_key_us} ubuntu@ec2...

$ dmesg | grep ' \*\*'
...
[ 1.795840] ** WARNING: Currently emulating unsupported memory accesses **
[ 1.795844] ** in /lib/tls glibc libraries. The emulation is **
...
[ 1.795867] ** Offending process: init (pid=1012) **
...

Revision history for this message
Scott Moser (smoser) wrote :

some more information

on the same system described above, doing additional things removes the warning:

$ echo 'hwcap 0 nosegneg' | sudo tee -a /etc/ld.so.conf.d/libc6-xen.conf
$ sudo ldconfig
$ sudo apt-get --purge remove libc6-i686

Eric pointed at
 - http://groups.google.com/group/ec2ubuntu/browse_thread/thread/1a3fd33f04766361/8f82524bd298a4a2
 which links to
 - http://wiki.xensource.com/xenwiki/DebianTlsLibcDiversion

So it appears we can find a solution, but I think at the moment every thing I see has negative effects on non-xen systems.

Revision history for this message
Eric Hammond (esh) wrote :

I may be biased (ok yes, I am biased) but it seems to me that if Xen needs to be configured differently than KVM, then it should be and we should be building separate images for EC2 and UEC. The EC2 images for Ubuntu should be the best EC2 images possible without compromise.

I can understand the desire to use the same images across platforms and hope that there is another solution that can accomplish this.

Revision history for this message
Eric Hammond (esh) wrote :

The xen-divert-tls-libc solution requires the user to know that special tweaking has been done to the system and causes problems in certain libc6 upgrades which require specialized manual intervention not necessary on standard Ubuntu. Remember that "upgrade" is a common practice when running EC2 images and creating ways for it to fail can cause business critical applications to experience outages.

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 427288] Re: Karmic i386 EC2 kernel emulating unsupported memory accesses

On Fri, Sep 11, 2009 at 08:13:57PM -0000, Eric Hammond wrote:
> I may be biased (ok yes, I am biased) but it seems to me that if Xen
> needs to be configured differently than KVM, then it should be and we
> should be building separate images for EC2 and UEC. The EC2 images for
> Ubuntu should be the best EC2 images possible without compromise.

We should do that if necessary, but *only* if necessary. Using a single
image cuts our testing workload in half, and if we can find a solution which
enables us to continue using a single image, we should pursue it.

--
 - mdz

Revision history for this message
Steve Langasek (vorlon) wrote :

Explanation of how the 'nosegneg' hwcap is supposed to work is found here:

http://lkml.org/lkml/2007/4/24/3

Upstream patch is here:

http://lkml.org/lkml/2007/4/23/339

From my testing, the linux-ec2 kernel does *not* set this hwcap in its vdso, therefore there's nothing to tell ld.so to use the nosegneg paths, regardless of whether /etc/ld.so.conf.d/ is configured correctly (which currently has to be done by hand; separate task will be opened).

The only reason the tested sequence had any effect at all was because libc6-i686 was removed from the system - even after doing this, the libc being used is the i486-optimized one in /lib, not the xen build in /lib/tls/i686/nosegneg. I'm hopeful that if we get nosegneg working, then there's no need to remove libc6-i686, so the image will work without penalty in both contexts.

Revision history for this message
Steve Langasek (vorlon) wrote :

current arch/x86/vdso/vdso32/note.S in linux-ec2 is doing this instead:

ELFNOTE_START(GNU, 2, "a")
        .long 1 /* ncaps */
#ifdef CONFIG_PARAVIRT_XEN
VDSO32_NOTE_MASK: /* Symbol used by arch/x86/xen/setup.c */
        .long 0 /* mask */
#else
        .long 1 << VDSO_NOTE_NONEGSEG_BIT /* mask */
#endif
        .byte VDSO_NOTE_NONEGSEG_BIT; .asciz "nosegneg" /* bit, name */
ELFNOTE_END

So this looks like a different note name; checking libc6-xen now to see if it's been updated to match.

Revision history for this message
Steve Langasek (vorlon) wrote :

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=499366 suggests that the problem may be that this should be 'hwcap 1 nosegneg', not 'hwcap 0 nosegneg'.

The libc6-xen package is supposed to ship the file ./etc/ld.so.conf.d/libc6-xen.conf. The Ubuntu package is missing this file.

Steve Langasek (vorlon)
affects: linux (Ubuntu Karmic) → eglibc (Ubuntu Karmic)
Changed in eglibc (Ubuntu Karmic):
assignee: John Johansen (jjohansen) → Steve Langasek (vorlon)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.10.1-0ubuntu9

---------------
eglibc (2.10.1-0ubuntu9) karmic; urgency=low

  * debian/sysdeps/i386.mk: cherrypick fix from Debian, lost somewhere along
    the way, that prevents /etc/ld.so.conf.d/xen.conf being added to the
    libc6-xen package. LP: #427288. This still leaves us with a delta
    relative to the Debian conffile name, which we ought to clean up at some
    later date.

 -- Steve Langasek <email address hidden> Sat, 12 Sep 2009 00:31:45 +0000

Changed in eglibc (Ubuntu Karmic):
status: Triaged → Fix Released
Revision history for this message
Eric Hammond (esh) wrote :

https://bugs.launchpad.net/ubuntu/jaunty/+source/glibc/+bug/246625 talks about the kernel version where it changed from "hwcap 0 nosegneg" to "hwcap 1 nosegneg" along with other things which sound important and may or may not be related to this issue.

The current bug has been marked "Fix Released". Does that mean the next EC2 image built will no longer have this problem? Scott, can you test this?

Revision history for this message
Steve Langasek (vorlon) wrote :

The eglibc side of this bug is fixed (or should be - the package just finished building on the buildds and looks correct, now it's a matter of testing that it all works together). To fix it in the EC2 images, the build will need to be updated to pull in both libc6-i686 and libc6-xen by default; I don't know where this is done, it doesn't appear to be part of the Ubuntu package seeds.

Scott Moser (smoser)
Changed in vm-builder (Ubuntu Karmic):
importance: Undecided → High
assignee: nobody → Scott Moser (smoser)
Revision history for this message
Scott Moser (smoser) wrote :

I've just tested on ami-a40fefcd , which uses the kernel in question that installing libc6-xen fixes this problem in ec2.

I booted the instance, then 'apt-get update && apt-get install libc6-xen'. After a reboot, I have:

$ uname -r
2.6.31-300-ec2
$ dpkg -l "libc6*" | grep ^ii
ii libc6 2.10.1-0ubuntu11 GNU C Library: Shared libraries
ii libc6-i686 2.10.1-0ubuntu11 GNU C Library: Shared libraries [i686 optimi
ii libc6-xen 2.10.1-0ubuntu11 GNU C Library: Shared libraries [Xen version
$ ldd /bin/bash
        linux-gate.so.1 => (0xb7ef6000)
        libncurses.so.5 => /lib/libncurses.so.5 (0xb7eb5000)
        libdl.so.2 => /lib/libdl.so.2 (0xb7eb1000)
        libc.so.6 => /lib/libc.so.6 (0xb7d54000)
        /lib/ld-linux.so.2 (0xb7ef7000)
$ dmesg | grep "\*\*.*WARN" || echo "no warnings"
no warnings
$ time perl -e 'glob("xxx*")'
real 0m0.007s
user 0m0.000s
sys 0m0.000s

Steve successfully debugged that /etc/ld.so.nohwcap existed and was causing the problems. After removal of that file and 'ldconfig', a ldd /bin/bash will show 'libc.so.6 => /lib/tls/i686/nosegneg/libc.so.6'

Changed in eglibc (Ubuntu Karmic):
status: Fix Released → In Progress
Steve Langasek (vorlon)
Changed in eglibc (Ubuntu Karmic):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.10.1-0ubuntu12

---------------
eglibc (2.10.1-0ubuntu12) karmic; urgency=low

  [ Steve Langasek ]
  * Restore missing depends/conflicts/replaces handling for findutils and
    belocs-locales-bin, lost in the latest merge.
  * Move ldconfig trigger handling to libc-bin postinst, since that's where
    ldconfig and the trigger are actually located.
  * Drop debian/local/etc_init.d from the source, which is no longer shipped
    in the package having been dropped in Debian
  * debian/rules.d/debhelper.mk: revert breakage from Debian experimental;
    pulling in file substitutions from script.in has to happen before
    substituting other tokens, since script.in/nohwcap.sh contains other
    tokens that have to be replaced. LP: #427288.

  [ Matthias Klose ]
  * Don't apply hppa patches, don't apply
    any/local-linuxthreads-kill_other.diff.

 -- Steve Langasek <email address hidden> Mon, 14 Sep 2009 16:16:10 -0700

Changed in eglibc (Ubuntu Karmic):
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

I tested this on ami-a40fefcd (alpha5.1)

I verified that before install of libc6-xen, ldd reports bash to be using /lib/tls/i686/cmov/libc.so.6 . after install, it shows /lib/tls/i686/nosegneg/libdl.so.2.

However, after a reboot, I still see the message. I suspect this is coming from initrd, or some early boot, and not used after that.

See attachment for a log of testing/verification.

Revision history for this message
Scott Moser (smoser) wrote :

Because of the state of the archive, I can't test it right now, but the patch we need to vmbuilder is

--- VMBuilder/plugins/ubuntu/karmic.py.orig 2009-09-15 16:09:51.000000000 -0400
+++ VMBuilder/plugins/ubuntu/karmic.py 2009-09-15 16:09:57.000000000 -0400
@@ -27,9 +27,10 @@ class Karmic(Jaunty):
         self.vm.addpkg += ['ec2-init',
                           'openssh-server',
                           'standard^',
                           'ec2-ami-tools',
- 'update-motd']
+ 'update-motd',
+ 'libc6-xen']

     def update_passwords(self):
         # Set the user password, using using defaults from /etc/login.defs (ie, no need to specify '-m')
         self.run_in_target('chpasswd', stdin=('%s:%s\n' % (self.vm.user, getattr(self.vm, 'pass'))))

Changed in vm-builder (Ubuntu Karmic):
status: New → In Progress
Revision history for this message
Scott Moser (smoser) wrote :

<soren> Never mind. I manually merged them.
<soren> And pushed them to nectarine.

"them" is in reference to [1] and [2], which fixes bug 420581 and this one (bug 427288) respectively. The next nightly builds at [3] should have libc6-xen in them.

[1] http://bazaar.launchpad.net/~ubuntu-virt/vmbuilder/trunk/revision/338
[2] http://bazaar.launchpad.net/~ubuntu-virt/vmbuilder/trunk/revision/339
[3] http://uec-images.ubuntu.com/karmic/

Changed in vm-builder (Ubuntu Karmic):
status: In Progress → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

flipping this back to fix-committed. I'll mark fix-released when we get a build output with libc6-xen in it.

Changed in vm-builder (Ubuntu Karmic):
status: Fix Released → Fix Committed
Eric Hammond (esh)
Changed in vmbuilder:
status: New → Invalid
Scott Moser (smoser)
Changed in vmbuilder:
status: Invalid → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

For the hopefully final record, the images do indeed still show a warning like above, but it is only for bug dbus programs (bug 432718).

Revision history for this message
Steve Langasek (vorlon) wrote :

dbus has been fixed; the only required vmbuilder change was made long ago, so closing this out.

Changed in vm-builder (Ubuntu Karmic):
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

This is fixed in 0.11.1-0ubuntu1.

Changed in vm-builder (Ubuntu Karmic):
status: Fix Released → Fix Committed
Steve Langasek (vorlon)
Changed in vmbuilder:
status: Fix Committed → Fix Released
Steve Langasek (vorlon)
Changed in vm-builder (Ubuntu Karmic):
status: Fix Committed → Fix Released
Revision history for this message
Shlomo (shlomo-swidler) wrote :

I'm running the Canonical base Karmic 32-bit AMI-1515f67c in EC2.

I get this in /var/log/syslog from boot:

Dec 27 12:57:29 ubuntu kernel: [ 7.011383] ***************************************************************
Dec 27 12:57:29 ubuntu kernel: [ 7.011388] ***************************************************************
Dec 27 12:57:29 ubuntu kernel: [ 7.011392] ** WARNING: Currently emulating unsupported memory accesses **
Dec 27 12:57:29 ubuntu kernel: [ 7.011396] ** in /lib/tls glibc libraries. The emulation is **
Dec 27 12:57:29 ubuntu kernel: [ 7.011400] ** slow. To ensure full performance you should **
Dec 27 12:57:29 ubuntu kernel: [ 7.011404] ** install a 'xen-friendly' (nosegneg) version of **
Dec 27 12:57:29 ubuntu kernel: [ 7.011408] ** the library, or disable tls support by executing **
Dec 27 12:57:29 ubuntu kernel: [ 7.011412] ** the following as root: **
Dec 27 12:57:29 ubuntu kernel: [ 7.011415] ** mv /lib/tls /lib/tls.disabled **
Dec 27 12:57:29 ubuntu kernel: [ 7.011420] ** Offending process: apache2 (pid=929) **
Dec 27 12:57:29 ubuntu kernel: [ 7.011424] ***************************************************************
Dec 27 12:57:29 ubuntu kernel: [ 7.011428] ***************************************************************

Should this be fixed in this AMI?

And, how can I tell what fixes have made it into each AMI?

Revision history for this message
Scott Moser (smoser) wrote :

Shlomo,
  Please see comment 21 above. You'll see the warning, but it will only affect dbus programs.

Revision history for this message
Kostas Chatzikokolakis (kostas-chatzi) wrote :

This problem is still present in lucid when using imagemagick (and I presume other packages), even though it does not appear when booting.

To reproduce
- launch "ami-a403f7cd" (latest lucid 32bit). dmesg output is clean
- install the 'imagemagick' package
- run 'convert' without any argument. The command takes 10+ seconds to run. After finishing dmesg output contains the following:

** WARNING: Currently emulating unsupported memory accesses **
** in /lib/tls glibc libraries. The emulation is **
** slow. To ensure full performance you should **
** install a 'xen-friendly' (nosegneg) version of **
** the library, or disable tls support by executing **
** the following as root: **
** mv /lib/tls /lib/tls.disabled **
** Offending process: convert (pid=910) **

Deleting /lib/tls does not help.
Interestingly, the issue does _not_ appear in the latest maverick.

ldd /usr/bin/convert output:
        linux-gate.so.1 => (0xb78e2000)
        libMagickCore.so.2 => /usr/lib/libMagickCore.so.2 (0xb76a3000)
        libMagickWand.so.2 => /usr/lib/libMagickWand.so.2 (0xb758e000)
        liblcms.so.1 => /usr/lib/liblcms.so.1 (0xb755a000)
        libtiff.so.4 => /usr/lib/libtiff.so.4 (0xb74fe000)
        libc.so.6 => /lib/tls/i686/nosegneg/libc.so.6 (0xb73a0000)
        libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0xb732a000)
        libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0xb7309000)
        libXext.so.6 => /usr/lib/libXext.so.6 (0xb72f9000)
        libXt.so.6 => /usr/lib/libXt.so.6 (0xb72a5000)
        libbz2.so.1.0 => /lib/libbz2.so.1.0 (0xb7293000)
        libz.so.1 => /lib/libz.so.1 (0xb727e000)
        libpthread.so.0 => /lib/tls/i686/nosegneg/libpthread.so.0
(0xb7265000)
        libltdl.so.7 => /usr/lib/libltdl.so.7 (0xb725c000)
        libdl.so.2 => /lib/tls/i686/nosegneg/libdl.so.2 (0xb7258000)
        libSM.so.6 => /usr/lib/libSM.so.6 (0xb724e000)
        libICE.so.6 => /usr/lib/libICE.so.6 (0xb7235000)
        libX11.so.6 => /usr/lib/libX11.so.6 (0xb7118000)
        libgomp.so.1 => /usr/lib/libgomp.so.1 (0xb710a000)
        libm.so.6 => /lib/tls/i686/nosegneg/libm.so.6 (0xb70e4000)
        /lib/ld-linux.so.2 (0xb78e3000)
        libuuid.so.1 => /lib/libuuid.so.1 (0xb70de000)
        libxcb.so.1 => /usr/lib/libxcb.so.1 (0xb70c4000)
        librt.so.1 => /lib/tls/i686/nosegneg/librt.so.1 (0xb70bb000)
        libXau.so.6 => /usr/lib/libXau.so.6 (0xb70b7000)
        libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0xb70b1000)

Any ideas? I would prefer to avoid upgrading to maverick because of this.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.