Bug #1404558 “IPv6 related kernel panic following upgrade to 3.1...” : Bugs : linux package : Ubuntu

Revision history for this message

Stéphane Graber (stgraber) wrote on 2014-12-20:

#1

Might be worth mentioning that all affected hosts are x86 64bit Intel.

For those I've got access to, the issue happened on:
- 2x Xeon E3-1245v2
- 1x Xeon E5-2620v2
- 1x Atom C2750
- 1x Atom D2500
- 1x Core i5 750

All running on pretty standard Intel boards, so the usual set of Intel chipsets for their generation.

Revision history for this message

Brad Figg (brad-figg) wrote on 2014-12-20: Missing required logs.

#2

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1404558

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete
tags:	added: utopic

Stéphane Graber (stgraber) on 2014-12-20

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Alberto Salvia Novella (es20490446e) on 2014-12-27

Changed in linux (Ubuntu):
importance:	Undecided → Critical

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-01-03:

#3

Hi Stephane,

Can you see if this issue was already fixed in the latest 3.13 upstream kernel:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.11-ckt13-trusty/

If it was not, we can bisect the issue.

It might also be worth testing the latest mainline kernel, to see if this issue came down from upstream:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.19-rc2-vivid/

tags:

added: regression-update

Joseph Salisbury (jsalisbury) on 2015-01-03

tags:

added: kernel-key trusty

Revision history for this message

Andy Whitcroft (apw) wrote on 2015-01-05:

#4

Could you also give us a flavour for how reproducible this is?

If this is seemingly fixed in the lts-utopic, the -ckt13 test is likely the most informative. There is also a kernel sitting in -proposed which might be worth a shot too.

Revision history for this message

Stéphane Graber (stgraber) wrote on 2015-01-05:

#5

I don't have a reproducer other than install the kernel and wait 24h as that's how long it took for some systems to panic...

From a very quick look, those kernels are mainline kernels, unfortunately all my hosts are LXC hosts using unprivileged containers with overlayfs, so I need kernels with the Ubuntu patchset applied for them to be usable.

For now, I'll go test the current -proposed kernel on my least critical system, see if I can get that one to panic.

Revision history for this message

Stéphane Graber (stgraber) wrote on 2015-01-05:

#6

Rebooted the Xeon E5-2620v2 system on linux-image-3.13.0-44-generic now.

Revision history for this message

Stéphane Graber (stgraber) wrote on 2015-01-05:

#7

Screenshot from 2015-01-05 16:40:16.png Edit (46.4 KiB, image/png)

Reproduced the panic with -44, same stack trace, screenshot attached. Booting the machine back on -40 now.

Revision history for this message

Andy Whitcroft (apw) wrote on 2015-01-06:

#8

This might be related to a backport issue in the upstream stable patch below:

  commit 4fab9071950c2021d846e18351e0f46a1cffd67b
  Author: Neal Cardwell <email address hidden>
  Date: Thu Aug 14 12:40:05 2014 -0400

tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()

I have produced some test kernels with the backport corrected, could you try the kernels below to confirm if this is indeed the underlying issue. Kernels are at the URL below:

http://people.canonical.com/~apw/lp1404558-trusty/

Please report any testing back here.

Andy Whitcroft (apw) on 2015-01-06

Changed in linux (Ubuntu):
assignee:	nobody → Andy Whitcroft (apw)
milestone:	none → ubuntu-15.01

Andy Whitcroft (apw) on 2015-01-06

Changed in linux (Ubuntu Trusty):
status:	New → In Progress
importance:	Undecided → Critical
assignee:	nobody → Andy Whitcroft (apw)

Joseph Salisbury (jsalisbury) on 2015-01-06

tags:

added: kernel-da-key
removed: kernel-key

Revision history for this message

Stéphane Graber (stgraber) wrote on 2015-01-07:

#9

Almost 24 hours and no kernel panic so far!

Revision history for this message

Stéphane Graber (stgraber) wrote on 2015-01-08:

#10

Still no panic after 48h, let's call it good.

Revision history for this message

Andy Whitcroft (apw) wrote on 2015-01-09:

#11

Patched pushed up to kernel-team@ for SRU.

Brad Figg (brad-figg) on 2015-01-09

Changed in linux (Ubuntu Trusty):
status:	In Progress → Fix Committed

Revision history for this message

Andy Whitcroft (apw) wrote on 2015-01-12:

#12

This was a backport specific issue, trusty alone is affected.

Changed in linux (Ubuntu):
status:	Confirmed → Invalid

Revision history for this message

Brad Figg (brad-figg) wrote on 2015-01-16:

#13

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:

added: verification-needed-trusty

Revision history for this message

Stéphane Graber (stgraber) wrote on 2015-01-16:

#14

Booted the new kernel and so far so good. Since there is no real reproducer for this bug, I'll mark this as verification-done and will come flip it back to verification-failed if the box panics by the time you push this kernel to updates.

tags:

added: verification-done-trusty
removed: verification-needed-trusty

Revision history for this message

Stephen Frost (sfrost) wrote on 2015-01-24:

#15

I've been running the new kenrel across all of my systems and have not had any problems for the past 2 days, so I'd call this good. Previously failures were happening pretty quickly and always within a day. Thanks!

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-01-30:

#16

Download full text (9.0 KiB)

This bug was fixed in the package linux - 3.13.0-45.74

---------------
linux (3.13.0-45.74) trusty; urgency=low

[ Seth Forshee ]

* Release Tracking Bug
- LP: #1410384

[ Jesse Barnes ]

  * SAUCE: drm/i915/vlv: assert and de-assert sideband reset at boot and
    resume v3
    - LP: #1401963

[ K. Y. Srinivasan ]

* SAUCE: storvsc: force SPC-3 compliance on win8 and win8 r2 hosts
- LP: #1406867

[ Timo Aaltonen ]

* SAUCE: Switch VLV/BYT to use i915_bdw.
- LP: #1401963

[ Upstream Kernel Changes ]

  * Revert "xhci: clear root port wake on bits if controller isn't wake-up
    capable"
    - LP: #1408779
  * KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
    - LP: #1400209
  * e1000e: Fix no connectivity when driver loaded with cable out
    - LP: #1400365
  * net/mlx4_core: Enable CQE/EQE stride support
    - LP: #1400127
  * net/mlx4_core: Cache line EQE size support
    - LP: #1400127
  * net/mlx4_en: Add mlx4_en_get_cqe helper
    - LP: #1400127
  * net/mlx4_core: Introduce mlx4_get_module_info for cable module info
    reading
    - LP: #1400127
  * ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool
    support
    - LP: #1400127
  * net/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev cap
    - LP: #1400127
  * net/mlx4_core: Add ethernet backplane autoneg device capability
    - LP: #1400127
  * ethtool, net/mlx4_en: Add 100M, 20G, 56G speeds ethtool reporting
    support
    - LP: #1400127
  * net/mlx4_en: Use PTYS register to query ethtool settings
    - LP: #1400127
  * net/mlx4_en: Use PTYS register to set ethtool settings (Speed)
    - LP: #1400127
  * net/mlx4_en: Add support for setting rxvlan offload OFF/ON
    - LP: #1400127
  * net/mlx4_en: Add ethtool support for [rx|tx]vlan offload set to OFF/ON
    - LP: #1400127
  * net/mlx4_core: Prevent VF from changing port configuration
    - LP: #1400127
  * net/mlx4_en: mlx4_en_set_settings() always fails when autoneg is set
    - LP: #1400127
  * ipv4: fix nexthop attlen check in fib_nh_match
    - LP: #1408779
  * vxlan: fix a use after free in vxlan_encap_bypass
    - LP: #1408779
  * vxlan: using pskb_may_pull as early as possible
    - LP: #1408779
  * vxlan: fix a free after use
    - LP: #1408779
  * ipv4: fix a potential use after free in ip_tunnel_core.c
    - LP: #1408779
  * ax88179_178a: fix bonding failure
    - LP: #1408779
  * tcp: md5: do not use alloc_percpu()
    - LP: #1408779
  * ipv4: dst_entry leak in ip_send_unicast_reply()
    - LP: #1408779
  * drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets
    - LP: #1408779
  * drivers/net: macvtap and tun depend on INET
    - LP: #1408779
  * ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
    - LP: #1408779
  * vti6: Use vti6_dev_init as the ndo_init function.
    - LP: #1408779
  * sit: Use ipip6_tunnel_init as the ndo_init function.
    - LP: #1408779
  * gre6: Move the setting of dev->iflink into the ndo_init functions.
    - LP: #1408779
  * vxlan: Do not reuse sockets for a different address family
    - LP: #1408779
  * net: sctp: fix memory leak in auth key management
    - LP: #1408779
  * smsc911x: power-...

This bug was fixed in the package linux - 3.13.0-45.74

---------------
linux (3.13.0-45.74) trusty; urgency=low

[ Seth Forshee ]

* Release Tracking Bug
    - LP: #1410384

[ Jesse Barnes ]

* SAUCE: drm/i915/vlv: assert and de-assert sideband reset at boot and
    resume v3
    - LP: #1401963

[ K. Y. Srinivasan ]

* SAUCE: storvsc: force SPC-3 compliance on win8 and win8 r2 hosts
    - LP: #1406867

[ Timo Aaltonen ]

* SAUCE: Switch VLV/BYT to use i915_bdw.
    - LP: #1401963

[ Upstream Kernel Changes ]

* Revert "xhci: clear root port wake on bits if controller isn't wake-up
    capable"
    - LP: #1408779
  * KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
    - LP: #1400209
  * e1000e: Fix no connectivity when driver loaded with cable out
    - LP: #1400365
  * net/mlx4_core: Enable CQE/EQE stride support
    - LP: #1400127
  * net/mlx4_core: Cache line EQE size support
    - LP: #1400127
  * net/mlx4_en: Add mlx4_en_get_cqe helper
    - LP: #1400127
  * net/mlx4_core: Introduce mlx4_get_module_info for cable module info
    reading
    - LP: #1400127
  * ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool
    support
    - LP: #1400127
  * net/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev cap
    - LP: #1400127
  * net/mlx4_core: Add ethernet backplane autoneg device capability
    - LP: #1400127
  * ethtool, net/mlx4_en: Add 100M, 20G, 56G speeds ethtool reporting
    support
    - LP: #1400127
  * net/mlx4_en: Use PTYS register to query ethtool settings
    - LP: #1400127
  * net/mlx4_en: Use PTYS register to set ethtool settings (Speed)
    - LP: #1400127
  * net/mlx4_en: Add support for setting rxvlan offload OFF/ON
    - LP: #1400127
  * net/mlx4_en: Add ethtool support for [rx|tx]vlan offload set to OFF/ON
    - LP: #1400127
  * net/mlx4_core: Prevent VF from changing port configuration
    - LP: #1400127
  * net/mlx4_en: mlx4_en_set_settings() always fails when autoneg is set
    - LP: #1400127
  * ipv4: fix nexthop attlen check in fib_nh_match
    - LP: #1408779
  * vxlan: fix a use after free in vxlan_encap_bypass
    - LP: #1408779
  * vxlan: using pskb_may_pull as early as possible
    - LP: #1408779
  * vxlan: fix a free after use
    - LP: #1408779
  * ipv4: fix a potential use after free in ip_tunnel_core.c
    - LP: #1408779
  * ax88179_178a: fix bonding failure
    - LP: #1408779
  * tcp: md5: do not use alloc_percpu()
    - LP: #1408779
  * ipv4: dst_entry leak in ip_send_unicast_reply()
    - LP: #1408779
  * drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets
    - LP: #1408779
  * drivers/net: macvtap and tun depend on INET
    - LP: #1408779
  * ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
    - LP: #1408779
  * vti6: Use vti6_dev_init as the ndo_init function.
    - LP: #1408779
  * sit: Use ipip6_tunnel_init as the ndo_init function.
    - LP: #1408779
  * gre6: Move the setting of dev->iflink into the ndo_init functions.
    - LP: #1408779
  * vxlan: Do not reuse sockets for a different address family
    - LP: #1408779
  * net: sctp: fix memory leak in auth key management
    - LP: #1408779
  * smsc911x: power-up phydev before doing a software reset.
    - LP: #1408779
  * sunvdc: add cdrom and v1.1 protocol support
    - LP: #1408779
  * sunvdc: compute vdisk geometry from capacity
    - LP: #1408779
  * sunvdc: limit each sg segment to a page
    - LP: #1408779
  * vio: fix reuse of vio_dring slot
    - LP: #1408779
  * sunvdc: don't call VD_OP_GET_VTOC
    - LP: #1408779
  * sparc64: Fix crashes in schizo_pcierr_intr_other().
    - LP: #1408779
  * sparc64: Do irq_{enter,exit}() around generic_smp_call_function*().
    - LP: #1408779
  * sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks
    - LP: #1408779
  * sparc64: Fix constraints on swab helpers.
    - LP: #1408779
  * inetdevice: fixed signed integer overflow
    - LP: #1408779
  * ipv4: Fix incorrect error code when adding an unreachable route
    - LP: #1408779
  * ieee802154: fix error handling in ieee802154fake_probe()
    - LP: #1408779
  * qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem
    - LP: #1408779
  * pptp: fix stack info leak in pptp_getname()
    - LP: #1408779
  * ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
    - LP: #1408779
  * aio: fix uncorrent dirty pages accouting when truncating AIO ring
    buffer
    - LP: #1408779
  * spi: dw: Fix dynamic speed change.
    - LP: #1408779
  * USB: serial: cp210x: add IDs for CEL MeshConnect USB Stick
    - LP: #1408779
  * iio: Fix IIO_EVENT_CODE_EXTRACT_DIR bit mask
    - LP: #1408779
  * usb: serial: ftdi_sio: add PIDs for Matrix Orbital products
    - LP: #1408779
  * USB: keyspan: fix tty line-status reporting
    - LP: #1408779
  * USB: keyspan: fix overrun-error reporting
    - LP: #1408779
  * USB: ssu100: fix overrun-error reporting
    - LP: #1408779
  * nfsd: correctly define v4.2 support attributes
    - LP: #1408779
  * SUNRPC: Fix locking around callback channel reply receive
    - LP: #1408779
  * nfsd: Fix slot wake up race in the nfsv4.1 callback code
    - LP: #1408779
  * bnx2fc: do not add shared skbs to the fcoe_rx_list
    - LP: #1408779
  * scsi: add Intel Multi-Flex to scsi scan blacklist
    - LP: #1408779
  * ARM: 8216/1: xscale: correct auxiliary register in suspend/resume
    - LP: #1408779
  * USB: xhci: don't start a halted endpoint before its new dequeue is set
    - LP: #1408779
  * USB: xhci: Reset a halted endpoint immediately when we encounter a
    stall.
    - LP: #1408779
  * usb: xhci: rework root port wake bits if controller isn't allowed to
    wakeup
    - LP: #1408779
  * ALSA: hda - Limit 40bit DMA for AMD HDMI controllers
    - LP: #1408779
  * PCI/MSI: Add device flag indicating that 64-bit MSIs don't work
    - LP: #1408779
  * gpu/radeon: Set flag to indicate broken 64-bit MSI
    - LP: #1408779
  * sound/radeon: Move 64-bit MSI quirk from arch to driver
    - LP: #1408779
  * powerpc/powernv: Honor the generic "no_64bit_msi" flag
    - LP: #1408779
  * powerpc/pseries: Honor the generic "no_64bit_msi" flag
    - LP: #1408779
  * MIPS: Loongson: Make platform serial setup always built-in.
    - LP: #1408779
  * net/ping: handle protocol mismatching scenario
    - LP: #1408779
  * usb-quirks: Add reset-resume quirk for MS Wireless Laser Mouse 6000
    - LP: #1408779
  * Input: xpad - use proper endpoint type
    - LP: #1408779
  * powerpc: 32 bit getcpu VDSO function uses 64 bit instructions
    - LP: #1408779
  * ARM: 8222/1: mvebu: enable strex backoff delay
    - LP: #1408779
  * ARM: 8226/1: cacheflush: get rid of restarting block
    - LP: #1408779
  * staging: r8188eu: Add new device ID for DLink GO-USB-N150
    - LP: #1408779
  * btrfs: zero out left over bytes after processing compression streams
    - LP: #1408779
  * smiapp: Only some selection targets are settable
    - LP: #1408779
  * i2c: omap: fix NACK and Arbitration Lost irq handling
    - LP: #1408779
  * drm/nouveau/gf116: remove copy1 engine
    - LP: #1408779
  * drm/i915: More cautious with pch fifo underruns
    - LP: #1408779
  * drm/i915: Unlock panel even when LVDS is disabled
    - LP: #1408779
  * AHCI: Add DeviceIDs for Sunrise Point-LP SATA controller
    - LP: #1408779
  * sata_fsl: fix error handling of irq_of_parse_and_map
    - LP: #1408779
  * drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with
    3.18.0-rc6
    - LP: #1408779
  * mm: frontswap: invalidate expired data on a dup-store failure
    - LP: #1408779
  * mm/vmpressure.c: fix race in vmpressure_work_fn()
    - LP: #1408779
  * drivers/input/evdev.c: don't kfree() a vmalloc address
    - LP: #1408779
  * mm: fix swapoff hang after page migration and fork
    - LP: #1408779
  * mm: fix anon_vma_clone() error treatment
    - LP: #1408779
  * slab: fix nodeid bounds check for non-contiguous node IDs
    - LP: #1408779
  * ahci: disable MSI on SAMSUNG 0xa800 SSD
    - LP: #1408779
  * i2c: davinci: generate STP always when NACK is received
    - LP: #1408779
  * ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic
    - LP: #1408779
  * ipv6: gre: fix wrong skb->protocol in WCCP
    - LP: #1408779
  * Fix race condition between vxlan_sock_add and vxlan_sock_release
    - LP: #1408779
  * tg3: fix ring init when there are more TX than RX channels
    - LP: #1408779
  * net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
    - LP: #1408779
  * rtnetlink: release net refcnt on error in do_setlink()
    - LP: #1408779
  * net: mvneta: fix Tx interrupt delay
    - LP: #1408779
  * net: mvneta: fix race condition in mvneta_tx()
    - LP: #1408779
  * net: sctp: use MAX_HEADER for headroom reserve in output path
    - LP: #1408779
  * Linux 3.13.11-ckt13
    - LP: #1408779
  * ipv6: fix swapped ipv4/ipv6 mtu_reduced callbacks
    - LP: #1404558
  * arm64: Fix machine_shutdown() definition
    - LP: #1404335
  * arm64: Fix deadlock scenario with smp_send_stop()
    - LP: #1404335
  * iwlwifi: mvm: a few more SKUs for 7260 and 3160
  * iwlwifi: fix and add 7265 series HW IDs
    - LP: #1408222
 -- Seth Forshee <seth.forshee@canonical.com>   Tue, 13 Jan 2015 11:37:56 -0600

Changed in linux (Ubuntu Trusty):
status:	Fix Committed → Fix Released

Revision history for this message

Richard van der Hoff (richvdh) wrote on 2015-02-27:

#17

dmesg.txt Edit (1.6 KiB, text/plain)

I was previously seeing this problem; I've now upgraded to 3.1.0-46 and am seeing a similar, but slightly different, panic (see attached)

Revision history for this message

Stéphane Graber (stgraber) wrote on 2015-02-28:

#18

I can confirm that something's broken with the recent kernel update...

Revision history for this message

Stéphane Graber (stgraber) wrote on 2015-02-28:

#19

Let's file a new bug for that one.

Revision history for this message

Stéphane Graber (stgraber) wrote on 2015-02-28:

#20

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1426618

Adam Conrad (adconrad) on 2015-03-02

Changed in linux (Ubuntu Trusty):
status:	Fix Released → Fix Committed
tags:	removed: verification-done-trusty

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-03-03:

#21

This bug was fixed in the package linux - 3.13.0-46.77

---------------
linux (3.13.0-46.77) trusty; urgency=low

[ Seth Forshee ]

  * Revert "ipv6: fix swapped ipv4/ipv6 mtu_reduced callbacks"
    - LP: #1404558
  * Release Tracking Bug
    - LP: #1427292
-- Seth Forshee <email address hidden> Mon, 02 Mar 2015 11:33:20 -0600

Changed in linux (Ubuntu Trusty):
status:	Fix Committed → Fix Released
status:	Fix Committed → Fix Released

Revision history for this message

Oliver Weis (oliver-c) wrote on 2015-03-03:

#23

Short feedback on the "fix". I have 2 KVM based 14.04 64bit systems which were not affected by this bug UNTIL the 3.13.0-46.77 fix was release. On 1 of the 2 systems everything seems fine on the other IPv6 is not available directly after booting thus e.g. nginx can not bind to the configured IPv6 address.

on BOTH system I see this in kern.log:

Mar 3 07:14:56 sigma kernel: [ 8.449276] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

but checking old kern.log file it has been this way for a long time so this is not the issue

on the system which works fine anyway I see this:

Mar 3 07:11:43 omega kernel: [ 21.556845] NFSD: starting 90-second grace period (net ffffffff81cdaa00)
Mar 3 07:11:44 omega kernel: [ 22.909888] random: nonblocking pool is initialized
Mar 3 07:11:45 omega kernel: [ 23.566021] ip_tables: (C) 2000-2006 Netfilter Core Team
Mar 3 07:11:45 omega kernel: [ 23.573073] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Mar 3 07:11:45 omega kernel: [ 23.637709] ip6_tables: (C) 2000-2006 Netfilter Core Team

so after 23 seconds ip_tables and ip6_tables is running and IPv6 is working.

on the system where IPv6 remains unavailable for 2 minutes and thus nginx refuses to start I see this:

Mar 3 07:14:57 sigma kernel: [ 9.748740] NFSD: starting 90-second grace period (net ffffffff81cdaa00)
Mar 3 07:16:26 sigma kernel: [ 102.269932] ip_tables: (C) 2000-2006 Netfilter Core Team
Mar 3 07:16:26 sigma kernel: [ 102.279642] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Mar 3 07:16:26 sigma kernel: [ 102.304003] ip6_tables: (C) 2000-2006 Netfilter Core Team
Mar 3 07:16:50 sigma kernel: [ 126.135981] perf samples too long (2621 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
Mar 3 07:23:11 sigma kernel: [ 507.343615] perf samples too long (5025 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
Mar 3 08:07:24 sigma kernel: [ 3159.511565] perf samples too long (10036 > 10000), lowering kernel.perf_event_max_sample_rate to 12500

This second system is an Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz KVM based system with 12GB and 4 Cores, the first system which works fine is an Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz based system with 8GB and 2 Cores.

As you can see it takes about 2 minutes until ip_tables as well as ip6_tables is starting on the second system. Which then causes nginx to not start up at all and having to restart it manually after IPv6 is available.

The perf samples too long error is also a new error I have not seen in the logs before. Unsure if it might be related to this problem or is related to changes made from 3.13.0-46.75 -> 3.13.0-46.76 a few days ago.

Short feedback on the "fix". I have 2 KVM based 14.04 64bit systems which were not affected by this bug UNTIL the 3.13.0-46.77 fix was release. On 1 of the 2 systems everything seems fine on the other IPv6 is not available directly after booting thus e.g. nginx can not bind to the configured IPv6 address.

on BOTH system I see this in kern.log:

Mar  3 07:14:56 sigma kernel: [    8.449276] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

but checking old kern.log file it has been this way for a long time so this is not the issue

on the system which works fine anyway I see this:

Mar  3 07:11:43 omega kernel: [   21.556845] NFSD: starting 90-second grace period (net ffffffff81cdaa00)
Mar  3 07:11:44 omega kernel: [   22.909888] random: nonblocking pool is initialized
Mar  3 07:11:45 omega kernel: [   23.566021] ip_tables: (C) 2000-2006 Netfilter Core Team
Mar  3 07:11:45 omega kernel: [   23.573073] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Mar  3 07:11:45 omega kernel: [   23.637709] ip6_tables: (C) 2000-2006 Netfilter Core Team

so after 23 seconds ip_tables and ip6_tables is running and IPv6 is working.

on the system where IPv6 remains unavailable for 2 minutes and thus nginx refuses to start I see this:

Mar  3 07:14:57 sigma kernel: [    9.748740] NFSD: starting 90-second grace period (net ffffffff81cdaa00)
Mar  3 07:16:26 sigma kernel: [  102.269932] ip_tables: (C) 2000-2006 Netfilter Core Team
Mar  3 07:16:26 sigma kernel: [  102.279642] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Mar  3 07:16:26 sigma kernel: [  102.304003] ip6_tables: (C) 2000-2006 Netfilter Core Team
Mar  3 07:16:50 sigma kernel: [  126.135981] perf samples too long (2621 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
Mar  3 07:23:11 sigma kernel: [  507.343615] perf samples too long (5025 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
Mar  3 08:07:24 sigma kernel: [ 3159.511565] perf samples too long (10036 > 10000), lowering kernel.perf_event_max_sample_rate to 12500

This second system is an  Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz KVM based system with 12GB and 4 Cores, the first system which works fine is an Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz based system with 8GB and 2 Cores.

As you can see it takes about 2 minutes until ip_tables as well as ip6_tables is starting on the second system. Which then causes nginx to not start up at all and having to restart it manually after IPv6 is available.

The perf samples too long error is also a new error I have not seen in the logs before. Unsure if it might be related to this problem or is related to changes made from  3.13.0-46.75 -> 3.13.0-46.76 a few days ago.

Ubuntu
linux package

IPv6 related kernel panic following upgrade to 3.13.0-43

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Invalid	Critical	Andy Whitcroft	Ubuntu ubuntu-15.01
	Trusty	Fix Released	Critical	Andy Whitcroft

Ubuntulinux package

IPv6 related kernel panic following upgrade to 3.13.0-43

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package