[linux-azure] Panic when triggering hibernation

Bug #1891931 reported by Marcelo Cerri
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Medium
Unassigned

Bug Description

[Impact]

We backported several upstream commits in LP #1880032, but the following commit wasn't necessary and it's causing a panic when trying to hibernate an azure instance, as described by comment #8 in LP #1880032 (https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1880032/comments/8):

0a14dbaa0736 ("video: hyperv_fb: Fix hibernation for the deferred IO feature"):
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/focal/commit/?h=Ubuntu-azure-5.4.0-1022.22&id=0a14dbaa0736a6021c02e74d42cf3a7ca5438da6

We should include the patch only if the kernel also includes
a4ddb11d297e ("video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver"

I manage to reproduce the panic and I can confirm that reverting the offending commit solves the problem. I managed to test it on several D and E instances types and the system successfully hibernates. I also tried to test the scenarios where the system has high memory usage and on a 8GB VM hibernation worked up until 70% of memory utilization.

[ 67.736061] ------------[ cut here ]------------
[ 67.736068] WARNING: CPU: 5 PID: 1358 at kernel/workqueue.c:3040 __flush_work+0x1b5/0x1d0
[ 67.736068] Modules linked in: xt_owner iptable_security xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper joydev hid_generic hyperv_fb cfbfillrect hid_hyperv intel_rapl_perf serio_raw hyperv_keyboard pata_acpi hv_netvsc hv_balloon hid cfbimgblt pci_hyperv cfbcopyarea hv_utils pci_hyperv_intf sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables autofs4
[ 67.736088] CPU: 5 PID: 1358 Comm: bash Not tainted 5.4.0-1022-azure #22-Ubuntu
[ 67.736089] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[ 67.736091] RIP: 0010:__flush_work+0x1b5/0x1d0
[ 67.736092] Code: f0 eb e3 4d 8b 7c 24 20 e9 f3 fe ff ff 8b 0b 48 8b 53 08 83 e1 08 48 0f ba 2b 03 80 c9 f0 e9 4f ff ff ff 0f 0b e9 68 ff ff ff <0f> 0b 45 31 f6 e9 5e ff ff ff e8 ec e0 fd ff 66 66 2e 0f 1f 84 00
[ 67.736095] RSP: 0018:ffffa7ce8a8ffb78 EFLAGS: 00010246
[ 67.736096] RAX: 0000000000000000 RBX: ffff8be3621f02a0 RCX: 0000000000000000
[ 67.736096] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8be3621f02a0
[ 67.736097] RBP: ffffa7ce8a8ffbf0 R08: 0000000000000000 R09: 00000000ff010101
[ 67.736098] R10: ffff8be363f7a320 R11: 0000000000000001 R12: ffff8be3621f02a0
[ 67.736098] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffffbc390fd1
[ 67.736099] FS: 00007f6df35fe740(0000) GS:ffff8be375d40000(0000) knlGS:0000000000000000
[ 67.736100] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 67.736100] CR2: 0000561eef2c1b50 CR3: 0000000e40a14004 CR4: 00000000001706e0
[ 67.736102] Call Trace:
[ 67.736108] __cancel_work_timer+0x107/0x180
[ 67.736119] cancel_delayed_work_sync+0x13/0x20
[ 67.736121] hvfb_suspend+0x48/0x80 [hyperv_fb]
[ 67.736122] vmbus_suspend+0x2a/0x40
[ 67.736125] dpm_run_callback+0x5b/0x150
[ 67.736127] __device_suspend_noirq+0x9e/0x2f0
[ 67.736128] dpm_suspend_noirq+0x101/0x2d0
[ 67.736130] dpm_suspend_end+0x53/0x80
[ 67.736132] hibernation_snapshot+0xd8/0x460
[ 67.736133] hibernate.cold+0x6d/0x1f6
[ 67.736135] state_store+0xde/0xe0
[ 67.736138] kobj_attr_store+0x12/0x20
[ 67.736141] sysfs_kf_write+0x3e/0x50
[ 67.736142] kernfs_fop_write+0xda/0x1b0
[ 67.736145] __vfs_write+0x1b/0x40
[ 67.736147] vfs_write+0xb9/0x1a0
[ 67.736149] ksys_write+0x67/0xe0
[ 67.736150] __x64_sys_write+0x1a/0x20
[ 67.736152] do_syscall_64+0x5e/0x200
[ 67.736156] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 67.736157] RIP: 0033:0x7f6df3712057

[Test Case]

Follow the steps from https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1880032/comments/14.

[Regression Potential]

The revert touches the Hyper-V framebuffer driver and can potentially cause the VM to not boot or cause hibernation to fail (again). Although the risk is low.

Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Focal):
status: New → In Progress
Revision history for this message
Marcelo Cerri (mhcerri) wrote :
Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Focal):
status: In Progress → Fix Committed
Stefan Bader (smb)
Changed in linux-azure (Ubuntu):
status: New → Invalid
Changed in linux-azure (Ubuntu Focal):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (98.3 KiB)

This bug was fixed in the package linux-azure - 5.4.0-1023.23

---------------
linux-azure (5.4.0-1023.23) focal; urgency=medium

  * focal/linux-azure: 5.4.0-1023.23 -proposed tracker (LP: #1890736)

  * Focal update: v5.4.52 upstream stable release (LP: #1887853)
    - [Packaging] module intel-rapl-perf rename

  * Focal update: v5.4.53 upstream stable release (LP: #1888560)
    - [Config] updateconfigs for BLK_DEV_SR_VENDOR

  * Focal update: v5.4.51 upstream stable release (LP: #1886995)
    - [Config] updateconfigs for EFI_CUSTOM_SSDT_OVERLAYS

  * Packaging resync (LP: #1786013)
    - [Packaging] update variants
    - [Packaging] update update.conf

  * [linux-azure] Panic when triggering hibernation (LP: #1891931)
    - Revert "video: hyperv_fb: Fix hibernation for the deferred IO feature"

  [ Ubuntu: 5.4.0-44.48 ]

  * focal/linux: 5.4.0-44.48 -proposed tracker (LP: #1891049)
  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
  * ipsec: policy priority management is broken (LP: #1890796)
    - xfrm: policy: match with both mark and mask on user interfaces

  [ Ubuntu: 5.4.0-43.47 ]

  * focal/linux: 5.4.0-43.47 -proposed tracker (LP: #1890746)
  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * Devlink - add RoCE disable kernel support (LP: #1877270)
    - devlink: Add new "enable_roce" generic device param
    - net/mlx5: Document flow_steering_mode devlink param
    - net/mlx5: Handle "enable_roce" devlink param
    - IB/mlx5: Rename profile and init methods
    - IB/mlx5: Load profile according to RoCE enablement state
    - net/mlx5: Remove unneeded variable in mlx5_unload_one
    - net/mlx5: Add devlink reload
    - IB/mlx5: Do reverse sequence during device removal
  * msg_zerocopy.sh in net from ubuntu_kernel_selftests failed (LP: #1812620)
    - selftests/net: relax cpu affinity requirement in msg_zerocopy test
  * Enlarge hisi_sec2 capability (LP: #1890222)
    - Revert "UBUNTU: [Config] Disable hisi_sec2 temporarily"
    - crypto: hisilicon - update SEC driver module parameter
  * Fix missing HDMI/DP Audio on an HP Desktop (LP: #1890441)
    - ALSA: hda/hdmi: Add quirk to force connectivity
  * Fix IOMMU error on AMD Radeon Pro W5700 (LP: #1890306)
    - PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken
  * ASoC:amd:renoir: the dmic can't record sound after suspend and resume
    (LP: #1890220)
    - SAUCE: ASoC: amd: renoir: restore two more registers during resume
  * No sound, Dummy output on Acer Swift 3 SF314-57G with Ice Lake core-i7 CPU
    (LP: #1877757)
    - ASoC: SOF: Intel: hda: fix generic hda codec support
  * Fix right speaker of HP laptop (LP: #1889375)
    - SAUCE: hda/realtek: Fix right speaker of HP laptop
  * blk_update_request error when mount nvme partition (LP: #1872383)
    - SAUCE: nvme-pci: prevent SK hynix PC400 from using Write Zeroes command
  * soc/amd/renoir: detect dmic from acpi table (LP: #1887734)
    - ASoC: amd: add logic to check dmic hardware runtime
    - ASoC: amd: add ACPI dependency check
    - ASoC: amd: fixed kernel warnings
  * soc/amd/renoir: change the module name to make it work with ucm3
    (LP: #1888166)
    - AsoC: amd: ad...

Changed in linux-azure (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Dexuan Cui (decui) wrote :
Download full text (5.0 KiB)

I can confirm now hibernation can work with 5.4.0-1023, despite a harmless warning:

root@decui-tmp-2004:~# echo disk >/sys/power/state
[ 56.945758] PM: hibernation entry
[ 57.165520] Filesystems sync: 0.007 seconds
[ 57.169492] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 57.177529] OOM killer disabled.
[ 57.180702] PM: Marking nosave pages: [mem 0x00000000-0x00000fff]
[ 57.185925] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
[ 57.191239] PM: Marking nosave pages: [mem 0x3fff0000-0xffffffff]
[ 57.197810] PM: Basic memory bitmaps created
[ 57.201563] PM: Preallocating image memory... done (allocated 210160 pages)
[ 57.623616] PM: Allocated 840640 kbytes in 0.41 seconds (2050.34 MB/s)
[ 57.629195] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
[ 57.637795] serial 00:04: disabled
[ 58.847939] Disabling non-boot CPUs ...
[ 58.852140] smpboot: CPU 1 is now offline
[ 58.857921] smpboot: CPU 2 is now offline
[ 58.863623] smpboot: CPU 3 is now offline
[ 58.869363] unchecked MSR access error: WRMSR to 0x40000106 (tried to write 0x412d4f49 000100ee) at rIP: 0xffffffff9ee1d9b8 (hv_cpu_die+0xe8/0x110)
[ 58.870052] Call Trace:
[ 58.870052] hv_suspend+0x5a/0x87
[ 58.870052] syscore_suspend+0x59/0x1a0
[ 58.870052] hibernation_snapshot+0x1bc/0x460
[ 58.870052] hibernate.cold+0x6d/0x1f6
[ 58.870052] state_store+0xde/0xe0
[ 58.870052] kobj_attr_store+0x12/0x20
[ 58.870052] sysfs_kf_write+0x3e/0x50
[ 58.870052] kernfs_fop_write+0xda/0x1b0
[ 58.870052] __vfs_write+0x1b/0x40
[ 58.870052] vfs_write+0xb9/0x1a0
[ 58.870052] ksys_write+0x67/0xe0
[ 58.870052] __x64_sys_write+0x1a/0x20
[ 58.870052] do_syscall_64+0x5e/0x200
[ 58.870052] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 58.870052] RIP: 0033:0x7f2f9dfcb057
[ 58.870052] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 58.870052] RSP: 002b:00007ffe96046608 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 58.870052] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f2f9dfcb057
[ 58.870052] RDX: 0000000000000005 RSI: 000055ca5250c450 RDI: 0000000000000001
[ 58.870052] RBP: 000055ca5250c450 R08: 000000000000000a R09: 0000000000000004
[ 58.870052] R10: 000055ca50a2d017 R11: 0000000000000246 R12: 0000000000000005
[ 58.870052] R13: 00007f2f9e0a66a0 R14: 00007f2f9e0a74a0 R15: 00007f2f9e0a68a0
[ 58.870052] PM: Creating hibernation image:
[ 58.870052] PM: Need to copy 201788 pages
[ 58.870052] PM: Normal pages needed: 201788 + 1024, available pages: 3992087
[ 58.870052] PM: Hibernation image created (201788 pages copied)
[ 58.870052] Enabling non-boot CPUs ...
[ 58.870052] x86: Booting SMP configuration:
[ 58.871862] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 58.875719] CPU1 is up
[ 58.877194] smpboot: Booting Node 0 Processor 2 APIC 0x2
[ 58.881047] CPU2 is up
[ 58.882499] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 58.886033] CPU3 is up
[ 58.891099] hv_utils: KVP IC version 4.0
[ 58.893181] ...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.