aws: Support hibernation on Graviton

Bug #2060992 reported by Philip Cox
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-aws (Ubuntu)
In Progress
Undecided
Philip Cox
Jammy
Fix Committed
Undecided
Philip Cox
Mantic
Fix Committed
Undecided
Philip Cox
Noble
Fix Released
Undecided
Philip Cox

Bug Description

SRU Justification:

[Impact]
This change contains two parts, the first is adding support for
  - KVM and guest support for the PSCI SYSTEM_OFF2 (hibernate) call

And the second part is:
   - Guest kernel support for clean boot on demand

For KVM and guest support for the PSCI SYSTEM_OFF2 (hibernate) call:

PSCI v1.3 adds support for SYSTEM_OFF2 which is analogous to ACPI S4 state.

This will allow hosting environments to determine that a guest is hibernated rather than just powered off, and ensure that they preserve the virtual environment appropriately to allow the guest to resume safely (or bump the hardware_signature in the FACS to trigger a clean reboot instead).

For Guest kernel support for clean boot on demand:

The FACS field in the ACPI table is optional, but can be used communicate the hardware_signature field. If this field changes on resuming from a hibernation a clean reboot should happen rather than the resume from hibernation.

On hardware reduced platforms[0] this field may exist but it is not exposed currently.

[Fix]

The changes for KVM and guest support for the PSCI SYSTEM_OFF2 (hibernate) call come from:
     https://<email address hidden>

The changes for Guest kernel support for clean boot on demand come from:
      https://<email address hidden>

Latest patches have been picked from:
   - noble/mantic: https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/psci-hibernate-6.8

    - jammy: https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/psci-hibernate-5.15

[Test Plan]
AWS test.

[Where problems could occur]
If on hardware reduced platforms that incorrectly support/advertise the FACS field, hibernation may break if it returns a hardware signature that changes.

[Other info]
SF# 00383181

[0]: See Section 4.1 of the ACPI spec for info on hardware-reduced platforms.
https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/04_ACPI_Hardware_Specification/ACPI_Hardware_Specification.html

Philip Cox (philcox)
description: updated
Revision history for this message
dwmw2 (dwmw2) wrote :

The ACPICA patch is merged upstream: https://github.com/acpica/acpica/commit/b3496dece6de2709373ad7338698ce91dec5215d

So I've reposted the kernel patches to reference the ACPICA commit ID:
https://<email address hidden>/

As before, the full set of patches is at
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/psci-hibernate
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/psci-hibernate-6.8

Philip Cox (philcox)
Changed in linux-aws (Ubuntu Mantic):
assignee: nobody → Philip Cox (philcox)
status: New → In Progress
Changed in linux-aws (Ubuntu Noble):
status: New → In Progress
Philip Cox (philcox)
Changed in linux-aws (Ubuntu Jammy):
status: New → In Progress
Philip Cox (philcox)
summary: - aws: Guest kernel support for clean boot on demand
+ aws: Support hibernation on Graviton
Philip Cox (philcox)
description: updated
Changed in linux-aws (Ubuntu Jammy):
assignee: nobody → Philip Cox (philcox)
Philip Cox (philcox)
description: updated
Philip Cox (philcox)
Changed in linux-aws (Ubuntu Jammy):
status: In Progress → Fix Committed
Changed in linux-aws (Ubuntu Mantic):
status: In Progress → Fix Committed
Changed in linux-aws (Ubuntu Noble):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/6.5.0-1021.21 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux-aws' to 'verification-done-mantic-linux-aws'. If the problem still exists, change the tag 'verification-needed-mantic-linux-aws' to 'verification-failed-mantic-linux-aws'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-aws-v2 verification-needed-mantic-linux-aws
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/5.15.0-1063.69 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-aws' to 'verification-done-jammy-linux-aws'. If the problem still exists, change the tag 'verification-needed-jammy-linux-aws' to 'verification-failed-jammy-linux-aws'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-aws-v2 verification-needed-jammy-linux-aws
Revision history for this message
Seth Carolan (secarola) wrote :

Batch tested the patched kernel and achieved 99+% success rate on CLI/console initiated Hibernate/Resumes across all ARM supported AWS EC2 instance families, C6g(d)(n), C7g(d), M6g(d), M7g(d), R6g(d), R7g(d), T4g. (4,175/4,200 test runs).

High level testing details:
1.) Spun up instance with patched AMI + this hibinit-agent patch (https://git.launchpad.net/~secarola/ubuntu/+source/ec2-hibinit-agent/commit/?h=applied/ubuntu/jammy-devel&id=034ec3ffdc8cbd9d319aa5815f02d60ec3e27f93).
2.) Started up bashscript "heartbeat" on the instance, pushing timestamp to dynamoDB table every 30 seconds.
3.) Hibernated instance through the AWS CLI
4.) Resumed instance through the AWS CLI
5.) Confirmed "heartbeat" updates to dynamoDB table after resume
6.) Repeated once more.

Manually tested instance initiated hibernation successfully:
High level testing details:
1.) Spun up instance with patched AMI + this hibinit-agent patch (https://git.launchpad.net/~secarola/ubuntu/+source/ec2-hibinit-agent/commit/?h=applied/ubuntu/jammy-devel&id=034ec3ffdc8cbd9d319aa5815f02d60ec3e27f93).
2.) Connected to the instance and started up bash script "heartbeat" on the instance, writing to text file every second with new timestamp.
#!/bin/bash
while :
do
        echo $(date) > text.txt
        sleep 1
done
3.) Hibernated instance through the GuestOS.
sudo swapon --priority=32767 /swap-hibinit
sudo systemctl hibernate
4.) Confirmed that hosting environments reported the instance as Hibernated and not shutdown.
4.) Resumed instance through the AWS CLI
5.) Connected to the instance and confirmed the date was still being written to the text file
cat text.txt

tags: added: verification-done-jammy-linux-aws
removed: verification-needed-jammy-linux-aws
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/6.8.0-1009.9 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-aws' to 'verification-done-noble-linux-aws'. If the problem still exists, change the tag 'verification-needed-noble-linux-aws' to 'verification-failed-noble-linux-aws'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-aws-v2 verification-needed-noble-linux-aws
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (103.6 KiB)

This bug was fixed in the package linux-aws - 6.8.0-1009.9

---------------
linux-aws (6.8.0-1009.9) noble; urgency=medium

  * noble/linux-aws: 6.8.0-1009.9 -proposed tracker (LP: #2064325)

  * aws: Support hibernation on Graviton (LP: #2060992)
    - SAUCE: firmware/psci: Add definitions for PSCI v1.3 specification (ALPHA)
    - SAUCE: KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation
    - SAUCE: KVM: arm64: Add support for PSCI v1.2 and v1.3
    - SAUCE: KVM: selftests: Add test for PSCI SYSTEM_OFF2
    - SAUCE: KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call
    - SAUCE: arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
    - SAUCE: ACPICA: Detect FACS even for hardware reduced platforms
    - SAUCE: arm64: acpi: Honour firmware_signature field of FACS, if it exists
    - [Config]: Enable hibernate on arm64

  [ Ubuntu: 6.8.0-34.34 ]

  * noble/linux: 6.8.0-34.34 -proposed tracker (LP: #2065167)
  * Packaging resync (LP: #1786013)
    - [Packaging] debian.master/dkms-versions -- update from kernel-versions
      (main/2024.04.29)

  [ Ubuntu: 6.8.0-32.32 ]

  * noble/linux: 6.8.0-32.32 -proposed tracker (LP: #2064344)
  * Packaging resync (LP: #1786013)
    - [Packaging] drop getabis data
    - [Packaging] update variants
    - [Packaging] update annotations scripts
    - [Packaging] debian.master/dkms-versions -- update from kernel-versions
      (main/2024.04.29)
  * Enable Nezha board (LP: #1975592)
    - [Config] Enable CONFIG_REGULATOR_FIXED_VOLTAGE on riscv64
  * Enable Nezha board (LP: #1975592) // Enable StarFive VisionFive 2 board
    (LP: #2013232)
    - [Config] Enable CONFIG_SERIAL_8250_DW on riscv64
  * RISC-V kernel config is out of sync with other archs (LP: #1981437)
    - [Config] Sync riscv64 config with other architectures
  * obsolete out-of-tree ivsc dkms in favor of in-tree one (LP: #2061747)
    - ACPI: scan: Defer enumeration of devices with a _DEP pointing to IVSC device
    - Revert "mei: vsc: Call wake_up() in the threaded IRQ handler"
    - mei: vsc: Unregister interrupt handler for system suspend
    - media: ipu-bridge: Add ov01a10 in Dell XPS 9315
    - SAUCE: media: ipu-bridge: Support more sensors
  * Fix after-suspend-mediacard/sdhc-insert test failed (LP: #2042500)
    - PCI/ASPM: Move pci_configure_ltr() to aspm.c
    - PCI/ASPM: Always build aspm.c
    - PCI/ASPM: Move pci_save_ltr_state() to aspm.c
    - PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
    - PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()
    - PCI/ASPM: Disable L1 before configuring L1 Substates
    - PCI/ASPM: Update save_state when configuration changes
  * RTL8852BE fw security fail then lost WIFI function during suspend/resume
    cycle (LP: #2063096)
    - wifi: rtw89: download firmware with five times retry
  * intel_rapl_common: Add support for ARL and LNL (LP: #2061953)
    - powercap: intel_rapl: Add support for Lunar Lake-M paltform
    - powercap: intel_rapl: Add support for Arrow Lake
  * Kernel panic during checkbox stress_ng_test on Grace running noble 6.8
    (arm64+largemem) kernel (LP: #2058557)
    - aio: Fix null ptr deref in aio_complete() wakeup
  * Av...

Changed in linux-aws (Ubuntu Noble):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.