Trusty - Null pointer dereference at queue_userspace_packet+0x1f/0x2d0 [openvswitch]

Bug #1497048 reported by Dave Chiluk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Dave Chiluk
Trusty
Fix Released
Medium
Dave Chiluk

Bug Description

[Impact]

 * With certain complicated network configurations as occur in Openstack clouds the kernel crashes with the below stack trace.

 * We have observed kernel panics when an openvswitch bridge is
populated with virtual devices (veth, for example) that have expansive
feature sets that include NETIF_F_GSO_GRE.

The failure occurs when foreign GRE encapsulated traffic
(explicitly not including the initial packets of a connection) arrives at
the system (likely via a switch flood event). The packets are GRO
accumulated, and passed to the OVS receive processing. As the connection
is not in the OVS kernel datapath table, the call path is:

ovs_dp_upcall ->
 queue_gso_packets ->
  __skb_gso_segment(skb, NETIF_F_SG, false)

Without 1e16aa3ddf863c6b9f37eddf52503230a62dedb3, __skb_gso_segment returns NULL,as the features from the device (including _GSO_GRE) are used in place of the _SG feature supplied to the call. The kernel panics on a subsequent dereference of the NULL pointer in queue_userspace_packet().

[Test Case]

 * We have no easy reproduce procedure.

[Regression Potential]

 * Both patches are pulled from upstream, but not accepted nor rejected as stable patches.
Stable threads
http://marc.info/?l=linux-netdev&m=143631594021618&w=2
http://marc.info/?l=linux-netdev&m=143951671004053&w=2

 * This patch has been in place in a large cloud where the issue used to occur frequently now for 50 days without related incident.

[Other Info]

 * 330966e501ffe282d7184fde4518d5e0c24bc7f8 is included as well, as it obviously avoids possible NULL dereferences in similar areas of code. As such we'd like to see both patches included.
________________________________________________________________________[415165.417433] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a3
[415165.417759] IP: [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch]
[415165.418073] PGD 0
[415165.418161] Oops: 0000 [#1] SMP
[415165.418299] Modules linked in: l2tp_eth l2tp_netlink l2tp_core vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp ip6table_filter ip6_tables iptable_filter ip_tables x_tables nbd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi openvswitch gre vxlan ip_tunnel dm_crypt gpio_ich dm_multipath bridge scsi_dh stp llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel joydev kvm shpchp sb_edac ipmi_si edac_core acpi_power_meter lpc_ich mac_hid xfs btrfs xor raid6_pq libcrc32c ses enclosure hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
[415165.421570] aesni_intel ixgbe igb aes_x86_64 lrw dca gf128mul glue_helper ptp ablk_helper usbhid cryptd megaraid_sas pps_core hid mdio i2c_algo_bit wmi
[415165.427942] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.13.0-53-generic #89-Ubuntu
[415165.440183] Hardware name: Cisco Systems Inc UCSC-C240-M3S/UCSC-C240-M3S, BIOS C240M3.2.0.1a.0.042820140036 04/28/2014
[415165.452693] task: ffff882012d01800 ti: ffff882012cfc000 task.ti: ffff882012cfc000
[415165.465847] RIP: 0010:[<ffffffffa015e24f>] [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch]
[415165.480003] RSP: 0018:ffff88203fce3b88 EFLAGS: 00010296
[415165.487411] RAX: 0000000000000000 RBX: ffff88203fce3ce8 RCX: ffff88203fce3ce8
[415165.502430] RDX: 0000000000000000 RSI: 000000000000000e RDI: ffffffff81cdab00
[415165.517448] RBP: ffff88203fce3bc8 R08: 0000000000000001 R09: 0000000000000000
[415165.532701] R10: 0000000000410000 R11: 000000000f9365e3 R12: ffff88203fce3ce8
[415165.548698] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000e
[415165.564653] FS: 0000000000000000(0000) GS:ffff88203fce0000(0000) knlGS:0000000000000000
[415165.580681] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[415165.588725] CR2: 00000000000000a3 CR3: 0000000001c0e000 CR4: 00000000000427e0
[415165.604495] Stack:
[415165.612127] ffffffff81d1ca68 ffff881fbd6c6c00 0000000000000009 0000000000000000
[415165.627360] ffff88203fce3ce8 0000000000000000 000000000000000e 0000000000000000
[415165.642642] ffff88203fce3cb8 ffffffffa015e5a1 0000000000000010 ffffffff81cdab00
[415165.657955] Call Trace:
[415165.665405] <IRQ>
[415165.665500]
[415165.672684] [<ffffffffa015e5a1>] queue_gso_packets+0xa1/0x1f0 [openvswitch]
[415165.680015] [<ffffffffa015de7b>] ? ovs_execute_actions+0x2b/0x30 [openvswitch]
[415165.694425] [<ffffffffa01607f5>] ovs_dp_upcall+0xe5/0xf0 [openvswitch]
[415165.701807] [<ffffffffa016090f>] ovs_dp_process_received_packet+0x10f/0x120 [openvswitch]
[415165.716228] [<ffffffffa0166aca>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[415165.723591] [<ffffffffa0167391>] netdev_frame_hook+0xc1/0x120 [openvswitch]
[415165.730799] [<ffffffff81626892>] __netif_receive_skb_core+0x262/0x840
[415165.737909] [<ffffffff81626e88>] __netif_receive_skb+0x18/0x60
[415165.744824] [<ffffffff81627a1e>] process_backlog+0xae/0x1a0
[415165.751644] [<ffffffff81627272>] net_rx_action+0x152/0x250
[415165.758248] [<ffffffff8106cc6c>] __do_softirq+0xec/0x2c0
[415165.764694] [<ffffffff8106d1b5>] irq_exit+0x105/0x110
[415165.770968] [<ffffffff81735c26>] do_IRQ+0x56/0xc0
[415165.777058] [<ffffffff8172b32d>] common_interrupt+0x6d/0x6d
[415165.783041] <EOI>
[415165.783127]
[415165.788840] [<ffffffff815d523f>] ? cpuidle_enter_state+0x4f/0xc0
[415165.794659] [<ffffffff815d5369>] cpuidle_idle_call+0xb9/0x1f0
[415165.800468] [<ffffffff8101d34e>] arch_cpu_idle+0xe/0x30
[415165.806126] [<ffffffff810bf0a5>] cpu_startup_entry+0xc5/0x290
[415165.811862] [<ffffffff810414dd>] start_secondary+0x21d/0x2d0
[415165.817479] Code: 32 74 04 48 89 71 08 5b 5d c3 66 90 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 49 89 d6 41 55 41 54 53 48 89 cb 48 83 ec 18 <f6> 82 a3 00 00 00 10 48 89 7d c8 48 c7 45 d0 00 00 00 00 0f 85
[415165.834611] RIP [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch]
[415165.845643] RSP <ffff88203fce3b88>
[415165.851171] CR2: 00000000000000a3
_________________________________________________________________________________________

After analysis we provided a 3.13 kernel patched with commit 1e16aa3ddf863c6b9f37eddf52503230a62dedb3 and
330966e501ffe282d7184fde4518d5e0c24bc7f8. As a result the fairly consistent crash is no longer occuring.

We attempted to push the patch through the stable process here
http://marc.info/?l=linux-netdev&m=143631594021618&w=2
and again
http://marc.info/?l=linux-netdev&m=143951671004053&w=2
Unfortunately upstream stable has yet to accept these upstream.

CVE References

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1497048

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Dave Chiluk (chiluk)
description: updated
Dave Chiluk (chiluk)
Changed in linux (Ubuntu):
status: Incomplete → In Progress
importance: Undecided → Medium
Luis Henriques (henrix)
Changed in linux (Ubuntu):
status: In Progress → Invalid
Luis Henriques (henrix)
Changed in linux (Ubuntu Trusty):
status: New → Fix Committed
Dave Chiluk (chiluk)
Changed in linux (Ubuntu Trusty):
assignee: nobody → Dave Chiluk (chiluk)
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Dave Chiluk (chiluk)
tags: added: verification-done-trusty
removed: verification-needed-trusty
Revision history for this message
Dave Chiluk (chiluk) wrote :

An external user reported that this, resolved their issue. So I'm marking this verification-done-trusty. Unfortunately there is no easy way to reproduce this.

Mathew Hodson (mhodson)
Changed in linux (Ubuntu Trusty):
milestone: none → trusty-updates
Changed in linux (Ubuntu):
milestone: trusty-updates → none
Changed in linux (Ubuntu Trusty):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.6 KiB)

This bug was fixed in the package linux - 3.13.0-66.108

---------------
linux (3.13.0-66.108) trusty; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1503713

  [ Andy Whitcroft ]

  * Revert "SAUCE: aufs3: mmap: Fix races in madvise_remove() and
    sys_msync()"
    - LP: #1503655

  [ Ben Hutchings ]

  * SAUCE: aufs3: mmap: Fix races in madvise_remove() and sys_msync()
    - LP: #1503655
    - CVE-2015-7312

linux (3.13.0-66.107) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1503021

  [ Ben Hutchings ]

  * SAUCE: aufs3: mmap: Fix races in madvise_remove() and sys_msync()
    - CVE-2015-7312

  [ John Johansen ]

  * SAUCE: (no-up) apparmor: fix mount not handling disconnected paths
    - LP: #1496430

  [ Upstream Kernel Changes ]

  * mmc: sdhci-pci: set the clear transfer mode register quirk for O2Micro
    - LP: #1472843
  * mmc: sdhci: Add a quirk for AMD SDHC transfer mode register need to be
    cleared for cmd without data
    - LP: #1472843
  * n_tty: Fix poll() when TIME_CHAR and MIN_CHAR == 0
    - LP: #1397976
  * net: make skb_gso_segment error handling more robust
    - LP: #1497048
  * net: gso: use feature flag argument in all protocol gso handlers
    - LP: #1497048
  * md/raid10: always set reshape_safe when initializing reshape_position.
    - LP: #1500810
  * md: flush ->event_work before stopping array.
    - LP: #1500810
  * ipv6: addrconf: validate new MTU before applying it
    - LP: #1500810
  * virtio-net: drop NETIF_F_FRAGLIST
    - LP: #1500810
  * RDS: verify the underlying transport exists before creating a
    connection
    - LP: #1500810
  * xen/gntdev: convert priv->lock to a mutex
    - LP: #1500810
  * xen/gntdevt: Fix race condition in gntdev_release()
    - LP: #1500810
  * PCI: Restore PCI_MSIX_FLAGS_BIRMASK definition
    - LP: #1500810
  * nfsd: Drop BUG_ON and ignore SECLABEL on absent filesystem
    - LP: #1500810
  * crypto: ixp4xx - Remove bogus BUG_ON on scattered dst buffer
    - LP: #1500810
  * xen-blkfront: don't add indirect pages to list when !feature_persistent
    - LP: #1500810
  * xen-blkback: replace work_pending with work_busy in
    purge_persistent_gnt()
    - LP: #1500810
  * USB: sierra: add 1199:68AB device ID
    - LP: #1500810
  * regmap: regcache-rbtree: Clean new present bits on present bitmap
    resize
    - LP: #1500810
  * target/iscsi: Fix double free of a TUR followed by a solicited NOPOUT
    - LP: #1500810
  * rbd: fix copyup completion race
    - LP: #1500810
  * md/raid1: extend spinlock to protect raid1_end_read_request against
    inconsistencies
    - LP: #1500810
  * target: REPORT LUNS should return LUN 0 even for dynamic ACLs
    - LP: #1500810
  * MIPS: Fix sched_getaffinity with MT FPAFF enabled
    - LP: #1500810
  * xhci: fix off by one error in TRB DMA address boundary check
    - LP: #1500810
  * perf: Fix fasync handling on inherited events
    - LP: #1500810
  * mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
    - LP: #1500810
  * MIPS: Make set_pte() SMP safe.
    - LP: #1500810
  * ipc: modify message queue accounting to not take kernel data structures
    into account
    - ...

Read more...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.