12.04 LTS - Kernel 3.2 - Xen live migration fails with Ubuntu as DomU because of gARP sent early

Bug #1154608 reported by Sylvain Munaut
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Precise
Fix Released
Medium
Joseph Salisbury
Quantal
Fix Released
Medium
Unassigned
Raring
Fix Released
Medium
Joseph Salisbury

Bug Description

The whole issue is detailed here :

http://xen.1045712.n5.nabble.com/lost-gARP-after-live-migration-td4531697.html

But in summary, the xen netfront driver is sending the gARP packet (to notify peers that it changed 'physical' location and allow them to update their ARP cache or to allow switches to update their MAC tables) too early.

It's sent before the Xen dom0 (host) is ready (ie backend driver up and connected to bridge) and so it's lost. The result is that the VM is unreachable for sometime after the migration (either until peers cache are cleared or until the VM spontaneously generates some traffic on it's own).

The kernel 3.3 has a very simple fix that I tried and it fixed the issue nicely :

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/net/xen-netfront.c?id=08e34eb14fe4cfd934b5c169a7682a969457c4ea

Please consider adding this to the LTS kernel.

Cheers,

    Sylvain

tags: added: xen
tags: added: linux live-migration network
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu):
status: New → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
tags: added: precise
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Precise test kernel with commit 08e34eb14fe4cfd934b5c169a7682a969457c4ea applied.

The test kernel can be downloaded from:
http://people.canonical.com/~jsalisbury/lp1154608/

Can you test that kernel and report back if it resolves this bug?

Revision history for this message
Sylvain Munaut (s-munaut) wrote : Re: [Bug 1154608] Re: 12.04 LTS - Kernel 3.2 - Xen live migration fails with Ubuntu as DomU because of gARP sent early

Hi,

> The test kernel can be downloaded from:
> http://people.canonical.com/~jsalisbury/lp1154608/

I don't see anything there.

Cheers,

    Sylvain

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Sorry, the files should be there now.

Revision history for this message
Sylvain Munaut (s-munaut) wrote :

Hi,

> Sorry, the files should be there now.

Indeed they are.

I just tested the kernel and it works fine, I did several live
migration with no interruption in network and I can now see the gARP
being sent at the dom0 bridge.

Thanks !

Cheers,

    Sylvain

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for testing. I'll send an SRU request to have this fix included in Precise and upstream v3.2 stable.

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Precise):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: New → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux (Ubuntu Quantal):
status: New → Fix Released
Changed in linux (Ubuntu Precise):
importance: Undecided → Medium
Changed in linux (Ubuntu Quantal):
importance: Undecided → Medium
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-precise
Revision history for this message
Sylvain Munaut (s-munaut) wrote :

Tested successfully with the kernel in proposed :

Linux migrate-test 3.2.0-40-generic #63-Ubuntu SMP Wed Mar 20 17:18:21 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

tags: added: verification-done-precise
removed: verification-needed-precise
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (12.1 KiB)

This bug was fixed in the package linux - 3.2.0-40.64

---------------
linux (3.2.0-40.64) precise-proposed; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1160017

  [ Stefan Bader ]

  * SAUCE: Revert "SAUCE: xen/pv-spinlock: Never enable interrupts in
    xen_spin_lock_slow()"

  [ Xiangliang Yu ]

  * SAUCE: PCI: define macro for marvell vendor ID
    - LP: #1159863
  * SAUCE: PCI: fix system hang issue of Marvell SATA host controller
    - LP: #1159863

linux (3.2.0-40.63) precise-proposed; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1157785

  [ Andy Whitcroft ]

  * [Config] re-disable CONFIG_SOUND_OSS_PRECLAIM
    - LP: #1105230

  [ Luis Henriques ]

  * [Config] CONFIG_NFS_V4_1=y
    - LP: #1111416

  [ Upstream Kernel Changes ]

  * Revert "drm: Add EDID_QUIRK_FORCE_REDUCED_BLANKING for ASUS VW222S"
    - LP: #1150557
  * tmpfs: fix use-after-free of mempolicy object
    - LP: #1143815
    - CVE-2013-1767
  * sunvdc: Fix off-by-one in generic_request().
    - LP: #1150557
  * genirq: Avoid deadlock in spurious handling
    - LP: #1150557
  * KVM: s390: Handle hosts not supporting s390-virtio.
    - LP: #1150557
  * workqueue: consider work function when searching for busy work items
    - LP: #1150557
  * v4l: Reset subdev v4l2_dev field to NULL if registration fails
    - LP: #1150557
  * omap_vout: find_vma() needs ->mmap_sem held
    - LP: #1150557
  * dca: check against empty dca_domains list before unregister provider
    - LP: #1150557
  * powerpc/eeh: Fix crash when adding a device in a slot with DDW
    - LP: #1150557
  * ext4: return ENOMEM if sb_getblk() fails
    - LP: #1150557
  * pcmcia/vrc4171: Add missing spinlock init
    - LP: #1150557
  * Purge existing TLB entries in set_pte_at and ptep_set_wrprotect
    - LP: #1150557
  * ARM: PXA3xx: program the CSMSADRCFG register
    - LP: #1150557
  * USB: option: add and update Alcatel modems
    - LP: #1150557
  * quota: autoload the quota_v2 module for QFMT_VFS_V1 quota format
    - LP: #1150557
  * ext4: fix possible use-after-free with AIO
    - LP: #1150557
  * s390/kvm: Fix store status for ACRS/FPRS
    - LP: #1150557
  * staging: comedi: disallow COMEDI_DEVCONFIG on non-board minors
    - LP: #1150557
  * ALSA: usb-audio: fix Roland A-PRO support
    - LP: #1150557
  * x86-32, mm: Rip out x86_32 NUMA remapping code
    - LP: #1150557
  * ALSA: hda - Release assigned pin/cvt at error path of hdmi_pcm_open()
    - LP: #1150557
  * ext4: fix race in ext4_mb_add_n_trim()
    - LP: #1150557
  * zram: Fix deadlock bug in partial read/write
    - LP: #1150557
  * Driver core: treat unregistered bus_types as having no devices
    - LP: #1150557
  * ALSA: aloop: Fix Oops while PM resume
    - LP: #1150557
  * UBIFS: fix double free of ubifs_orphan objects
    - LP: #1150557
  * tty: set_termios/set_termiox should not return -EINTR
    - LP: #1150557
  * hrtimer: Prevent hrtimer_enqueue_reprogram race
    - LP: #1150557
  * nfsd: Fix memleak
    - LP: #1150557
  * staging: comedi: check s->async for poll(), read() and write()
    - LP: #1150557
  * ACPI: Add DMI entry for Sony VGN-FW41E_H
    - LP: #1150557
  * vgacon/vt: clear buf...

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.