efivarfs:efivarfs.sh in ubuntu_kernel_selftests crash L-6.2 ARM64 node dazzle (rcu_preempt detected stalls)

Bug #2015741 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Issue found on 6.2.0-20.20 ARM64 instances "dazzle" only.

Looks like the efivarfs:efivarfs.sh will never finish properly and the whole test will be interrupted abnormally.

Test log:
 Running 'make run_tests -C efivarfs TEST_PROGS=efivarfs.sh TEST_GEN_PROGS='' TEST_CUSTOM_PROGS='''
 make: Entering directory '/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/efivarfs'
 TAP version 13
 1..1
 # selftests: efivarfs: efivarfs.sh
 # --------------------
 # running test_create
 # --------------------
 Timer expired (5400 sec.), nuking pid 21170

This issue can be found since 6.2.0-19.19

This node was not tested with 6.2.0-18.18

Po-Hsu Lin (cypressyew)
tags: added: 6.2 arm64 lunar ubuntu-kernel-selftests
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2015741

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Po-Hsu Lin (cypressyew) wrote (last edit ): Re: memfd:run_fuse_test.sh in ubuntu_kernel_selftests crash L-6.2 ARM64

I found this issue is limited to node "dazzle", other ARM64 nodes like kopter-kernel and scobee-kernel they do not have this issue.

Also, a manual test show the real cause seems to be the efivarfs.sh in efivarfs, which can be found failed with timeout in the RT test report.

Running the test manually on dazzle with "set -x" added to the script:
$ sudo ./efivarfs.sh
+ check_prereqs
+ local 'msg=skip all tests:'
+ '[' 0 '!=' 0 ']'
+ grep -q '^\S\+ /sys/firmware/efi/efivars efivarfs' /proc/mounts
+ rc=0
+ run_test test_create
+ local test=test_create
+ echo --------------------
--------------------
+ echo 'running test_create'
running test_create
+ echo --------------------
--------------------
++ type -t test_create
+ '[' function = function ']'
+ test_create
+ local 'attrs=\x07\x00\x00\x00'
+ local file=/sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27
+ printf '\x07\x00\x00\x00\x00'

dmesg:
[ 420.122478] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 420.128599] rcu: 20-...0: (9 GPs behind) idle=1a04/1/0x4000000000000000 softirq=1701/1701 fqs=6790
[ 420.137665] (detected by 30, t=15005 jiffies, g=6805, q=768 ncpus=32)
[ 420.137670] Task dump for CPU 20:
[ 420.137672] task:kworker/u64:0 state:R running task stack:0 pid:9 ppid:2 flags:0x0000000a
[ 420.137680] Workqueue: efi_rts_wq efi_call_rts
[ 420.137691] Call trace:
[ 420.137693] __switch_to+0xbc/0x100
[ 420.137699] 0xffff80002585bb2c
[ 484.991153] INFO: task efivarfs.sh:1786 blocked for more than 120 seconds.
[ 484.998061] Not tainted 6.2.0-20-generic #20-Ubuntu
[ 485.003478] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 485.011332] task:efivarfs.sh state:D stack:0 pid:1786 ppid:1782 flags:0x00000004
[ 485.011339] Call trace:
[ 485.011341] __switch_to+0xbc/0x100
[ 485.011349] __schedule+0x2fc/0x7b0
[ 485.011353] schedule+0x68/0x160
[ 485.011356] schedule_timeout+0x1a8/0x1dc
[ 485.011360] wait_for_completion+0xe0/0x180
[ 485.011364] virt_efi_set_variable+0x16c/0x220
[ 485.011369] efivar_set_variable_locked+0x80/0x120
[ 485.011372] efivar_entry_set_get_size+0xc4/0x180
[ 485.011377] efivarfs_file_write+0xb0/0x1d0
[ 485.011380] vfs_write+0xd0/0x310
[ 485.011385] ksys_write+0x7c/0x130
[ 485.011388] __arm64_sys_write+0x28/0x50
[ 485.011392] invoke_syscall+0x7c/0x124
[ 485.011396] el0_svc_common.constprop.0+0x5c/0x1cc
[ 485.011399] do_el0_svc+0x38/0x60
[ 485.011402] el0_svc+0x30/0xe0
[ 485.011407] el0t_64_sync_handler+0x11c/0x150
[ 485.011411] el0t_64_sync+0x1a8/0x1ac

I will modify the bug title and description accordingly.

summary: - memfd:run_fuse_test.sh in ubuntu_kernel_selftests crash L-6.2 ARM64
+ efivarfs:efivarfs.sh in ubuntu_kernel_selftests crash L-6.2 ARM64 node
+ dazzle (rcu_preempt detected stalls)
Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Andrea Righi (arighi) wrote :

Same issue as LP: #2011748, that should be fixed by: a6d8a9c1e5fa ("arm64: efi: Use SMBIOS processor version to key off Ampere quirk")

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 6.2.0-21.21

---------------
linux (6.2.0-21.21) lunar; urgency=medium

  * lunar/linux: 6.2.0-21.21 -proposed tracker (LP: #2016249)

  * efivarfs:efivarfs.sh in ubuntu_kernel_selftests crash L-6.2 ARM64 node
    dazzle (rcu_preempt detected stalls) (LP: #2015741)
    - efi/libstub: smbios: Use length member instead of record struct size
    - arm64: efi: Use SMBIOS processor version to key off Ampere quirk
    - efi/libstub: smbios: Drop unused 'recsize' parameter

  * Miscellaneous Ubuntu changes
    - SAUCE: selftests/bpf: ignore pointer types check with clang
    - SAUCE: selftests/bpf: avoid conflicting data types in profiler.inc.h
    - [Packaging] get rid of unnecessary artifacts in linux-headers

  * Miscellaneous upstream changes
    - Revert "UBUNTU: SAUCE: Revert "efi: random: refresh non-volatile random seed
      when RNG is initialized""
    - Revert "UBUNTU: SAUCE: Revert "efi: random: fix NULL-deref when refreshing
      seed""

 -- Andrea Righi <email address hidden> Fri, 14 Apr 2023 12:11:49 +0200

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Closing this as the kernel bug has been marked as fixed (and dazzle is dead)

Changed in ubuntu-kernel-tests:
status: New → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.2.0-21.21 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar' to 'verification-done-lunar'. If the problem still exists, change the tag 'verification-needed-lunar' to 'verification-failed-lunar'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux verification-needed-lunar
Revision history for this message
Andrei Gherzan (agherzan) wrote :

As cypressyew mentioned above, dazzle is dead, so we can close and mark this bug verified for lunar.

tags: added: verification-done-lunar
removed: verification-needed-lunar
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/6.2.0-1009.9 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar' to 'verification-done-lunar'. If the problem still exists, change the tag 'verification-needed-lunar' to 'verification-failed-lunar'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux-azure verification-needed-lunar
removed: verification-done-lunar
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.