Multiple Kexec in AWS Nitro instances fail

Bug #1869948 reported by Guilherme G. Piccoli
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Guilherme G. Piccoli
Xenial
Fix Released
High
Guilherme G. Piccoli
Bionic
Fix Released
High
Guilherme G. Piccoli
Eoan
Fix Released
High
Guilherme G. Piccoli
Focal
Fix Released
High
Guilherme G. Piccoli

Bug Description

[Impact]
* Currently, users cannot perform multiple kernel kexec loads on AWS Nitro instances (KVM-based); after the 2nd or 3rd kexec, an initrd corruption is observed, with the following signature:

 Initramfs unpacking failed: junk within compressed archive
[...]
 Kernel panic - not syncing: No working init found.
Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc7-gpiccoli+ #26 Hardware name: Amazon EC2 t3.large/, BIOS 1.0 10/16/2017
Call Trace:
  dump_stack+0x6d/0x9a
  ? csum_partial_copy_generic+0x150/0x170
  panic+0x101/0x2e3
  ? do_execve+0x25/0x30
  ? rest_init+0xb0/0xb0
  kernel_init+0xfb/0x100
  ret_from_fork+0x35/0x40

* After investigation (see comment 2), it was noticed the Amazon ena network driver doesn't provide a shutdown() handler, hence it could be performing a DMA transaction to a previous valid address during boot, which would then corrupt kernel memory. The following patch was proposed and fixed the issue, allowing 1000 kexecs to be executed successfully with no issues observed: 428c491332bc("net: ena: Add PCI shutdown handler to allow safe kexec") [ git.kernel.org/linus/428c491332bc ].

* Hence, we are hereby requesting SRU for this patch. It was tested in all supported series (4.4, 4.15 and 5.3) in Amazon Nitro instances with success, and reviewed/acked by ena driver team and a kexec developer from other distro. Worth mentioning that we proposed an upstream multi-vendor discussion about this issue: marc.info/?l=kexec&m=158299605013194

[Test case]

* The basic test procedure is about performing multiple kexecs sequentially; AWS does not provide a full console, so in case of failures one could check the instance screenshot or use pstore/ramoops in order to collect dmesg after a crash in a preserved memory area. The commands used to perform kexec are:

kexec -l <kernel file> --initrd <initrd file> --reuse-cmdline
systemctl kexec

Alternatively, one could user "--append=" instead of "--reuse-cmdline" if a change in kexec command-line is desired; also, to execute the kexec-loaded kernel both "kexec -e" and "systemctl kexec" are equally valid.

* On comment 3 we proposed a script/approach to auto-test kexecs, used here to perform 1000 kexecs with the proposed patch.

[Regression Potential]

* Although the patch proposed here introduce a PCI handler, it kept the remove handler identical and based shutdown strongly on ena_remove(), changing just netdev handling following other upstream drivers. It was extensively tested and presented no issue. Also, it's self-contained and affect only one driver, so any other cloud providers or non-cloud environment wouldn't be even affected by the patch.

* In case of a potential regression, it could manifest as a delay or issue on reboot/shutdown path, only if ena driver is in use.

CVE References

Changed in linux (Ubuntu Eoan):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux (Ubuntu Eoan):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
Changed in linux (Ubuntu Eoan):
status: New → Confirmed
Changed in linux (Ubuntu Bionic):
status: New → Confirmed
Changed in linux (Ubuntu Xenial):
status: New → Confirmed
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

An initrd corruption in AWS Nitro (KVM-based) instances was reported when trying multiple kexecs sequentially - it usually manifests after the 2nd or 3rd kexec. By using pstore/ramoops, we collected the attached log.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

After debugging the problem, a potential workaround was found which alleviates but doesn't fix the issue; the workaround is to use the "retain_initrd" on kexec boots to prevent kernel from freeing the initrd memory area. Also, it was observed that bigger initrds tend to show the problem more consistently.

After using pstore/ramoops to collect logs (and ftrace) on failure and observe the same issue in multiple kernel versions (including mainline) and other distros, it was clear the reason was a memory corruption. Since kexec is fast path on reboot, not going through the full BIOS reset, it was conjectured that an adapter not properly shutdown on kexec path could have its firmware throwing an invalid memory access in form of DMA write to a previous valid address, effectively corrupting an arbitrary region.

Then, it was noticed Amazon ena driver does not have a shutdown handler, which is used on reboot/kexec to quiesce properly the devices (through the call chain device_shutdown() -> pci_device_shutdown() -> driver .shutdown() handler, if any).

In case the device has no shutdown handler, PCI layer will clear its master bit on PCI command register, disabling the adapter. But this operation doesn't quiesce the device's firmware, and in the next boot, when it gets activated (aka, its master bit gets set), it may perform a buffered memory operation.

Tests on mainline kernel performing rmmod of ena driver before kexec showed that the initrd corruption didn't happen anymore, due to rmmod calling ena_remove(), which properly turned the adapter down before the kexecs.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Testing multiple kexecs usually requires either using two machines (one triggering ssh commands to the other, which will kexec) or if in one machine only, cron scripts could be used. The latter approach was the choice here to validate this Launchpad, by adding the following entry in crontab:

"@reboot /bin/bash /root/kexec.sh"

The attached "kexec.sh" will perform kexecs until ${KLIMIT} is reached, value which is accounted in the file "/root/kexec_cnt" (as a single number) in our example.

description: updated
Changed in linux (Ubuntu Xenial):
status: Confirmed → In Progress
Changed in linux (Ubuntu Bionic):
status: Confirmed → In Progress
Changed in linux (Ubuntu Eoan):
status: Confirmed → In Progress
Changed in linux (Ubuntu Focal):
status: Confirmed → In Progress
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

SRU request submitted to kernel team mailing list: https://lists.ubuntu.com/archives/kernel-team/2020-April/108684.html

Revision history for this message
Khaled El Mously (kmously) wrote :

Even though the proposed patch wasn't applied to Focal, based on Seth's comment it did make its way to Focal via upstream-stable. Therefore, marking Focal as fix-released as well.

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Eoan):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Khaled El Mously (kmously) wrote :

Comment #5 should have said "marking Focal as fix-committed as well"

Changed in linux (Ubuntu Disco):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Changed in linux (Ubuntu Disco):
importance: Undecided → Medium
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

I've verified this LP using AWS instances, so the kernel versions I tested were -aws flavored; this was necessary given the patch proposed here is related to the ena driver, which manages an AWS exclusive virtual NIC.

The versions tested were:

4.4.0.1105 xenial-updates
4.4.0.1106 xenial-proposed
4.15.0.1065 bionic-updates
4.15.0.1066 bionic-proposed
5.3.0.1016 eoan-updates (tested with Bionic-HWE analog)
5.3.0.1017 eoan-proposed (tested with Bionic-HWE analog)

The test was based in the [Test] section / comment #3, and I've managed to reproduce the issue in all versions on -updates, whereas no version in -proposed showed the issue (20 kexecs succeeded). In the failure case, in the 2nd or at most 3rd kexec, we've noticed the crash on boot, initrd corruption. In Xenial (kernel 4.4), due to the "small" size of initrd, it was needed to install linux-modules-extra to increase the size of the file and hence expose the memory corruption.

Also, I've checked in all kernels if the symbols added by the patch were there, with the following command:

# grep "ena_remov\|ena_shut" /proc/kallsyms
ffffffffc0005690 t __ena_shutoff [ena]
ffffffffc0005770 t ena_shutdown [ena]
ffffffffc0005790 t ena_remove [ena]

The above output is from a -proposed kernel; kernels in -updates only show ena_remove() symbol. Finally, I checked the patch in the generic flavors trees, for X/B/E/F, and they are present in the latest tag (corresponding to kernels in -proposed and tag Ubuntu-5.4.0-22.26 for Focal).

So, I'm hereby marking this LP as verified for all releases. See next comment for a note about Disco kernel.
Thanks,

Guilherme

tags: added: verification-done verification-done-bionic verification-done-eoan verification-done-xenial
removed: verification-needed-bionic verification-needed-eoan verification-needed-xenial
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Regarding Disco, despite the "Fix Committed" status I didn't find the patch in the latest tags from generic tree (Ubuntu-5.0.0-46.50) nor AWS tree (Ubuntu-aws-5.0.0-1024.27), so I think the patch wasn't merged (which is not a big deal, given Bionic HWE is now based on 5.3).

I've reverted the "Fix Committed" status, let me know if we should de-nominate Disco.

Thanks,

Guilherme

Changed in linux (Ubuntu Disco):
status: Fix Committed → Opinion
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.3.0-51.44

---------------
linux (5.3.0-51.44) eoan; urgency=medium

  * CVE-2020-11884
    - SAUCE: s390/mm: fix page table upgrade vs 2ndary address mode accesses

 -- Thadeu Lima de Souza Cascardo <email address hidden> Wed, 22 Apr 2020 17:35:41 -0300

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.15.0-99.100

---------------
linux (4.15.0-99.100) bionic; urgency=medium

  * CVE-2020-11884
    - SAUCE: s390/mm: fix page table upgrade vs 2ndary address mode accesses

 -- Marcelo Henrique Cerri <email address hidden> Wed, 22 Apr 2020 15:31:14 -0300

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (17.6 KiB)

This bug was fixed in the package linux - 4.4.0-178.208

---------------
linux (4.4.0-178.208) xenial; urgency=medium

  * xenial/linux: 4.4.0-178.208 -proposed tracker (LP: #1870660)

  * CVE-2019-19768
    - blktrace: Protect q->blk_trace with RCU
    - blktrace: fix dereference after null check

  * Multiple Kexec in AWS Nitro instances fail (LP: #1869948)
    - net: ena: Add PCI shutdown handler to allow safe kexec

  * Insert test_bpf module will report 4 failures for ubuntu_bpf_jit on X s390x
    (LP: #1768452)
    - test_bpf: flag tests that cannot be jited on s390

  * Mounting LVM snapshots with xfs can hit kernel BUG in nvme driver
    (LP: #1869229)
    - block: fix bio_will_gap() for first bvec with offset

  * Xenial update: 4.4.217 upstream stable release (LP: #1868629)
    - NFS: Remove superfluous kmap in nfs_readdir_xdr_to_array
    - r8152: check disconnect status after long sleep
    - net: nfc: fix bounds checking bugs on "pipe"
    - bnxt_en: reinitialize IRQs when MTU is modified
    - fib: add missing attribute validation for tun_id
    - nl802154: add missing attribute validation
    - nl802154: add missing attribute validation for dev_type
    - team: add missing attribute validation for port ifindex
    - team: add missing attribute validation for array index
    - nfc: add missing attribute validation for SE API
    - nfc: add missing attribute validation for vendor subcommand
    - ipvlan: add cond_resched_rcu() while processing muticast backlog
    - ipvlan: do not add hardware address of master to its unicast filter list
    - ipvlan: egress mcast packets are not exceptional
    - ipvlan: do not use cond_resched_rcu() in ipvlan_process_multicast()
    - ipvlan: don't deref eth hdr before checking it's set
    - macvlan: add cond_resched() during multicast processing
    - net: fec: validate the new settings in fec_enet_set_coalesce()
    - slip: make slhc_compress() more robust against malicious packets
    - bonding/alb: make sure arp header is pulled before accessing it
    - net: fq: add missing attribute validation for orphan mask
    - iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn +
      add_taint
    - drm/amd/display: remove duplicated assignment to grph_obj_type
    - gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache
    - KVM: x86: clear stale x86_emulate_ctxt->intercept value
    - ARC: define __ALIGN_STR and __ALIGN symbols for ARC
    - efi: Fix a race and a buffer overflow while reading efivars via sysfs
    - iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint
    - iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page
    - nl80211: add missing attribute validation for critical protocol indication
    - nl80211: add missing attribute validation for channel switch
    - netfilter: cthelper: add missing attribute validation for cthelper
    - iommu/vt-d: Fix the wrong printing in RHSA parsing
    - iommu/vt-d: Ignore devices with out-of-spec domain number
    - ipv6: restrict IPV6_ADDRFORM operation
    - efi: Add a sanity check to efivar_store_raw()
    - batman-adv: Fix invalid read while copying bat_iv.bcast_own
    - batman-adv: Only p...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Disco):
importance: Medium → Low
status: Opinion → Won't Fix
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
no longer affects: linux (Ubuntu Disco)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.