[Potential Regression] dscr_inherit_exec_test from powerpc in ubuntu_kernel_selftests failed on B/E/F

Bug #1888332 reported by Po-Hsu Lin
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Fix Released
Medium
Thadeu Lima de Souza Cascardo
Eoan
Won't Fix
Undecided
Thadeu Lima de Souza Cascardo
Focal
Fix Released
Medium
Thadeu Lima de Souza Cascardo

Bug Description

[Impact]
Code that touches DSCR user MSR does not set dscr_inherit, which breaks DSCR restore doing context switches and inheritance when forking. DSCR is used to control cache hinting. This is caused by lack of kernel interrupt when the DSCR user MSR is written, which is controlled by FSCR which would otherwise cause a facility unavailable interrupt.

[Test case]
apt-get source linux
cd linux-5.4.0/tools/testing/tools/selftests/powerpc/
make -j 32
make -C dscr run_tests
make -C ptrace run_tests
make -C tm run_tests

Look up for "not ok" versus "ok", specially for dscr_inherit_exec_test, ptrace-tar and tm-resched-tar.

[Potential regression]
Manipulating DSCR might break on different machines (with DTs containing
/cpus/ibm,powerpc-cpu-features, for example). Code that does so might crash because the facility unavailable interrupt handling might be not working correctly.

====================================
Issue found on 5.3.0-64.58 with P8 node modoc (passed with P9 node)

 # selftests: powerpc/dscr: dscr_inherit_exec_test
 # test: dscr_inherit_exec_test
 # tags: git_version:unknown
 # Parent DSCR 1 was not inherited over exec (kernel value)
 # Child didn't exit cleanly
 # failure: dscr_inherit_exec_test
 not ok 5 selftests: powerpc/dscr: dscr_inherit_exec_test # exit=1

Po-Hsu Lin (cypressyew)
tags: added: 5.3 eoan kqa-blocker ppc64el sru-20200629 ubuntu-kernel-selftests
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Test passed with 5.3.0-62.56:
# selftests: powerpc/dscr: dscr_inherit_exec_test
# test: dscr_inherit_exec_test
# tags: git_version:f21e446-dirty
# success: dscr_inherit_exec_test
ok 5 selftests: powerpc/dscr: dscr_inherit_exec_test

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I can see this failure in 5.3.0-63.57-generic as well.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Since this 5.3.0-64.58 is a respin for 5.3.0-63.57, I think we can call this a potential regression.

summary: - dscr_inherit_exec_test from powerpc in ubuntu_kernel_selftests failed on
- E
+ [Potential Regression] dscr_inherit_exec_test from powerpc in
+ ubuntu_kernel_selftests failed on E
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1888332

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Eoan):
status: New → Incomplete
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: [Potential Regression] dscr_inherit_exec_test from powerpc in ubuntu_kernel_selftests failed on E

I can reproduce this on 5.3.0-64.58 with the testing tools in 5.3.0-62.56 kernel tree.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: [Potential Regression] dscr_inherit_exec_test from powerpc in ubuntu_kernel_selftests failed on E/F

Affecting Focal P8 as well

Note that you will have to review it directly on jenkins, the whole test suite terminates at the ftracetest

summary: [Potential Regression] dscr_inherit_exec_test from powerpc in
- ubuntu_kernel_selftests failed on E
+ ubuntu_kernel_selftests failed on E/F
tags: added: 5.4 focal sru-20200810
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This failure can be found on 4.15.0-114.115~16.04.1 PowerPC as well

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Spotted on 4.15.0-114.115 as well.

summary: [Potential Regression] dscr_inherit_exec_test from powerpc in
- ubuntu_kernel_selftests failed on E/F
+ ubuntu_kernel_selftests failed on B/E/F
tags: added: 4.15 bionic
tags: added: xenial
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

On Bionic P8, I can reproduce this issue against proposed kernel (4.15.0-114.115) with source code in 4.15.0-112. Indicating this might be a kernel issue instead of a test case issue.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Note for Bionic P9, this test didn't fail on node baltar.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Download full text (3.5 KiB)

A bisect for arch/powerpc/ shows this is the first bad commit on 4.15:

$ git bisect bad
7d10952e8a56f87a53fc57594078555a9dfd4a07 is the first bad commit
commit 7d10952e8a56f87a53fc57594078555a9dfd4a07
Author: Michael Ellerman <email address hidden>
Date: Thu May 28 00:58:42 2020 +1000

    powerpc/64s: Save FSCR to init_task.thread.fscr after feature init

    BugLink: https://bugs.launchpad.net/bugs/1885176

    commit 912c0a7f2b5daa3cbb2bc10f303981e493de73bd upstream.

    At boot the FSCR is initialised via one of two paths. On most systems
    it's set to a hard coded value in __init_FSCR().

    On newer skiboot systems we use the device tree CPU features binding,
    where firmware can tell Linux what bits to set in FSCR (and HFSCR).

    In both cases the value that's configured at boot is not propagated
    into the init_task.thread.fscr value prior to the initial fork of init
    (pid 1), which means the value is not used by any processes other than
    swapper (the idle task).

    For the __init_FSCR() case this is OK, because the value in
    init_task.thread.fscr is initialised to something sensible. However it
    does mean that the value set in __init_FSCR() is not used other than
    for swapper, which is odd and confusing.

    The bigger problem is for the device tree CPU features case it
    prevents firmware from setting (or clearing) FSCR bits for use by user
    space. This means all existing kernels can not have features
    enabled/disabled by firmware if those features require
    setting/clearing FSCR bits.

    We can handle both cases by saving the FSCR value into
    init_task.thread.fscr after we have initialised it at boot. This fixes
    the bug for device tree CPU features, and will allow us to simplify
    the initialisation for the __init_FSCR() case in a future patch.

    Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
    Cc: <email address hidden> # v4.12+
    Signed-off-by: Michael Ellerman <email address hidden>
    Link: https://<email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>
    Signed-off-by: Kamal Mostafa <email address hidden>
    Signed-off-by: Khalid Elmously <email address hidden>

    :040000 040000 9c654d310ed9b7c4a1cf16620d120ec93624eda3
    05b34a61190e63ccde3b7d02e2183dc64b32c812 M arch

$ git bisect log
git bisect start '--' 'arch/powerpc/'
\# bad: [f4daf25f7f8608d1c14c85ea0b73c9e1e1eb2dba] UBUNTU: Ubuntu-4.15.0-114.115
git bisect bad f4daf25f7f8608d1c14c85ea0b73c9e1e1eb2dba
\# good: [495149ddc61a5997857fda041ccd4c81cac46e00] UBUNTU: Ubuntu-4.15.0-112.113
git bisect good 495149ddc61a5997857fda041ccd4c81cac46e00
\# bad: [07ad1246146fa49430d2455bd45db1c8da4d521c] powerpc/perf/hv-24x7: Fix inconsistent output values incase multiple hv-24x7 events run
git bisect bad 07ad1246146fa49430d2455bd45db1c8da4d521c
\# good: [f30471f4138df69bd4585d91c1f31a282daa41e7] powerpc/64s: Don't let DT CPU features set FSCR_DSCR
git bisect good f30471f4138df69bd4585d91c1f31a282daa41e7
\# bad: [0e198dfae237e9a9654d87b7c6df12146feaec26] sche...

Read more...

Changed in linux (Ubuntu Eoan):
status: Incomplete → Confirmed
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This test failure is caused by the kernel not setting dscr_inherit when the user dscr MSR is written to. Which is caused by FSCR not raising a facility unavailable interrupt. Which is ironic as one of the other patches that is backported is 993e3d96fd08c3ebf7566e43be9b8cd622063e6d ("powerpc/64s: Don't let DT CPU features set FSCR_DSCR"), which should prevent this situation.

The reason dscr_inherit_test does not fail is because it writes to the kernel MSR first. Changing dscr_inherit_exec_test to do the same, it passes.

Testing 5.8 shows that it's fixed somehow, though I haven't found the exact commit that would fix this. Reverting 912c0a7f2b5daa3cbb2bc10f303981e493de73bd ("powerpc/64s: Save FSCR to init_task.thread.fscr after feature init") seems reasonable here.

But failing to inherit dscr would cause cache performance regressions that are hard to justify respining the kernel for.

Cascardo.

Changed in linux (Ubuntu Eoan):
status: Confirmed → Won't Fix
no longer affects: ubuntu-kernel-tests
no longer affects: ubuntu-kernel-tests/trunk
Changed in linux (Ubuntu Focal):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → Triaged
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Apparently, cherry-picking 0828137e8f16721842468e33df0460044a0c588b ("powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()") fixes the issue. Which explains why 5.8 does not show the problem.

Building a kernel with that patch applied so I can test it.

Cascardo.

description: updated
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
importance: Undecided → High
importance: High → Medium
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
Stefan Bader (smb)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (97.9 KiB)

This bug was fixed in the package linux - 5.4.0-45.49

---------------
linux (5.4.0-45.49) focal; urgency=medium

  * focal/linux: 5.4.0-45.49 -proposed tracker (LP: #1893050)

  * [Potential Regression] dscr_inherit_exec_test from powerpc in
    ubuntu_kernel_selftests failed on B/E/F (LP: #1888332)
    - powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

linux (5.4.0-44.48) focal; urgency=medium

  * focal/linux: 5.4.0-44.48 -proposed tracker (LP: #1891049)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * ipsec: policy priority management is broken (LP: #1890796)
    - xfrm: policy: match with both mark and mask on user interfaces

linux (5.4.0-43.47) focal; urgency=medium

  * focal/linux: 5.4.0-43.47 -proposed tracker (LP: #1890746)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Devlink - add RoCE disable kernel support (LP: #1877270)
    - devlink: Add new "enable_roce" generic device param
    - net/mlx5: Document flow_steering_mode devlink param
    - net/mlx5: Handle "enable_roce" devlink param
    - IB/mlx5: Rename profile and init methods
    - IB/mlx5: Load profile according to RoCE enablement state
    - net/mlx5: Remove unneeded variable in mlx5_unload_one
    - net/mlx5: Add devlink reload
    - IB/mlx5: Do reverse sequence during device removal

  * msg_zerocopy.sh in net from ubuntu_kernel_selftests failed (LP: #1812620)
    - selftests/net: relax cpu affinity requirement in msg_zerocopy test

  * Enlarge hisi_sec2 capability (LP: #1890222)
    - Revert "UBUNTU: [Config] Disable hisi_sec2 temporarily"
    - crypto: hisilicon - update SEC driver module parameter

  * Fix missing HDMI/DP Audio on an HP Desktop (LP: #1890441)
    - ALSA: hda/hdmi: Add quirk to force connectivity

  * Fix IOMMU error on AMD Radeon Pro W5700 (LP: #1890306)
    - PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken

  * ASoC:amd:renoir: the dmic can't record sound after suspend and resume
    (LP: #1890220)
    - SAUCE: ASoC: amd: renoir: restore two more registers during resume

  * No sound, Dummy output on Acer Swift 3 SF314-57G with Ice Lake core-i7 CPU
    (LP: #1877757)
    - ASoC: SOF: Intel: hda: fix generic hda codec support

  * Fix right speaker of HP laptop (LP: #1889375)
    - SAUCE: hda/realtek: Fix right speaker of HP laptop

  * blk_update_request error when mount nvme partition (LP: #1872383)
    - SAUCE: nvme-pci: prevent SK hynix PC400 from using Write Zeroes command

  * soc/amd/renoir: detect dmic from acpi table (LP: #1887734)
    - ASoC: amd: add logic to check dmic hardware runtime
    - ASoC: amd: add ACPI dependency check
    - ASoC: amd: fixed kernel warnings

  * soc/amd/renoir: change the module name to make it work with ucm3
    (LP: #1888166)
    - AsoC: amd: add missing snd- module prefix to the acp3x-rn driver kernel
      module
    - SAUCE: remove a kernel module since its name is changed

  * Focal update: v5.4.55 upstream stable release (LP: #1890343)
    - AX.25: Fix out-of-bounds read in ax25_connect()
    - AX.25: Prevent out-of-bounds read in ax25_sendmsg()
    - dev: Defer free of skbs in flush_backlog
    - drivers/net/wan/x25_asy: Fix to make i...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (55.0 KiB)

This bug was fixed in the package linux - 4.15.0-115.116

---------------
linux (4.15.0-115.116) bionic; urgency=medium

  * bionic/linux: 4.15.0-115.116 -proposed tracker (LP: #1893055)

  * [Potential Regression] dscr_inherit_exec_test from powerpc in
    ubuntu_kernel_selftests failed on B/E/F (LP: #1888332)
    - powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

linux (4.15.0-114.115) bionic; urgency=medium

  * bionic/linux: 4.15.0-114.115 -proposed tracker (LP: #1891052)

  * ipsec: policy priority management is broken (LP: #1890796)
    - xfrm: policy: match with both mark and mask on user interfaces

linux (4.15.0-113.114) bionic; urgency=medium

  * bionic/linux: 4.15.0-113.114 -proposed tracker (LP: #1890705)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Reapply "usb: handle warm-reset port requests on hub resume" (LP: #1859873)
    - usb: handle warm-reset port requests on hub resume

  * Bionic update: upstream stable patchset 2020-07-29 (LP: #1889474)
    - gpio: arizona: handle pm_runtime_get_sync failure case
    - gpio: arizona: put pm_runtime in case of failure
    - pinctrl: amd: fix npins for uart0 in kerncz_groups
    - mac80211: allow rx of mesh eapol frames with default rx key
    - scsi: scsi_transport_spi: Fix function pointer check
    - xtensa: fix __sync_fetch_and_{and,or}_4 declarations
    - xtensa: update *pos in cpuinfo_op.next
    - drivers/net/wan/lapbether: Fixed the value of hard_header_len
    - net: sky2: initialize return of gm_phy_read
    - drm/nouveau/i2c/g94-: increase NV_PMGR_DP_AUXCTL_TRANSACTREQ timeout
    - irqdomain/treewide: Keep firmware node unconditionally allocated
    - SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO
      compeletion")
    - spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours
    - IB/umem: fix reference count leak in ib_umem_odp_get()
    - uprobes: Change handle_swbp() to send SIGTRAP with si_code=SI_KERNEL, to fix
      GDB regression
    - ALSA: info: Drop WARN_ON() from buffer NULL sanity check
    - ASoC: rt5670: Correct RT5670_LDO_SEL_MASK
    - btrfs: fix double free on ulist after backref resolution failure
    - btrfs: fix mount failure caused by race with umount
    - btrfs: fix page leaks after failure to lock page for delalloc
    - bnxt_en: Fix race when modifying pause settings.
    - hippi: Fix a size used in a 'pci_free_consistent()' in an error handling
      path
    - ax88172a: fix ax88172a_unbind() failures
    - net: dp83640: fix SIOCSHWTSTAMP to update the struct with actual
      configuration
    - drm: sun4i: hdmi: Fix inverted HPD result
    - net: smc91x: Fix possible memory leak in smc_drv_probe()
    - bonding: check error value of register_netdevice() immediately
    - mlxsw: destroy workqueue when trap_register in mlxsw_emad_init
    - ipvs: fix the connection sync failed in some cases
    - i2c: rcar: always clear ICSAR to avoid side effects
    - bonding: check return value of register_netdevice() in bond_newlink()
    - serial: exar: Fix GPIO configuration for Sealevel cards based on XR17V35X
    - scripts/decode_stacktrace: strip basepath from all paths
    - HID: i...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.