disable “CONFIG_HISI_DMA” config for ubuntu version

Bug #1936771 reported by Fred Kimmy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kunpeng920
Fix Released
Undecided
Ike Panhc
Ubuntu-20.04
Fix Released
Undecided
Ike Panhc
Ubuntu-20.04-hwe
Fix Released
Undecided
Ike Panhc
linux (Ubuntu)
Fix Released
Undecided
Ike Panhc
Focal
Fix Released
Undecided
Ike Panhc
Hirsute
Fix Released
Undecided
Ike Panhc
Impish
Fix Released
Undecided
Ike Panhc

Bug Description

[Impact]
Setup soft RAID5 on kunpeng920 machine and system will crash because of hisi_dma timeout. This issue can be reproduced with all Ubuntu kernel with hisi_dma.

[Test Plan]
Setup soft RAID5 and wait for few seconds. Kernel will crash.

[Regression Risk]
CONFIG_HISI_DMA only affects kunpeng920 platform. Minimal risk for other platform, and full regression test is needed on kunpeng920.

=======================
[Bug Description]
disable “CONFIG_HISI_DMA” config for ubuntu version

[Steps to Reproduce]
1)
2)
3)

[Actual Results]
this module cause some error

[Expected Results]
ok
[Reproducibility]

[Additional information]
(Firmware version, kernel version, affected hardware, etc. if required):

[Resolution]

CVE References

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Can you provide the details of error caused by CONFIG_HISI_DMA please? Thanks a lot.

Ike Panhc (ikepanhc)
Changed in kunpeng920:
status: New → Incomplete
Revision history for this message
Fred Kimmy (kongzizaixian) wrote :

[ 8343.152017] hisi_dma 0000:7b:00.0: dma_sync_wait: timeout!
[ 8343.157493] Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
[ 8343.165807] CPU: 29 PID: 19770 Comm: md5_raid5 Not tainted 5.8.0-41-generic #46~20.04.1-Ubuntu
[ 8343.174378] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 2280-V2 CS V5.B143.01 04/22/2021
[ 8343.183985] Call trace:
[ 8343.186429] dump_backtrace+0x0/0x200
[ 8343.190075] show_stack+0x20/0x30
[ 8343.193380] dump_stack+0xc0/0x118
[ 8343.196770] panic+0x150/0x36c
[ 8343.199815] async_tx_submit+0x0/0x400 [async_tx]
[ 8343.204497] async_trigger_callback+0x94/0x15c [async_tx]
[ 8343.209881] raid_run_ops+0x8ec/0x1190 [raid456]
[ 8343.214479] handle_stripe+0x7b4/0x1050 [raid456]
[ 8343.219163] handle_active_stripes.isra.0+0x3d8/0x530 [raid456]
[ 8343.225056] raid5d+0x358/0x6a8 [raid456]
[ 8343.229054] md_thread+0xac/0x1a0
[ 8343.232358] kthread+0xf4/0x120
[ 8343.235489] ret_from_fork+0x10/0x18
[ 8343.239170] SMP: stopping secondary CPUs
[ 8343.243128] Kernel Offset: 0x5335c0680000 from 0xffff800010000000
[ 8343.249190] PHYS_OFFSET: 0xffffd31280000000
[ 8343.253354] CPU features: 0x040002,22a08a38
[ 8343.257518] Memory Limit: none
[ 8343.260673] ---[ end Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction ]---

if setting soft raid ,this sys will cause aboving call trace. Pls disable hisi_dma function.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Thanks. I have few more questions.

1) Is this issue only reproduced on 5.8 kernel?
2) Which RAID is needed to reproduce this issue?
3) Could you provide step-by-step to reproduce this issue?

I have tried raid0 on 20.04 HWE and GA kernel but can not reproduce.

$ cat /proc/version;cat /proc/mdstat
Linux version 5.8.0-63-generic (buildd@bos02-arm64-056) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #71~20.04.1-Ubuntu SMP Thu Jul 15 17:46:44 UTC 2021
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdb2[1] sdb1[0]
      104791040 blocks super 1.2 [2/2] [UU]

unused devices: <none>

Revision history for this message
Ike Panhc (ikepanhc) wrote :

I can reproduce on 5.4 and 5.11 kernel and there is no CONFIG_HISI_DMA on 4.15 kernel.

Build a test kernel and running regression test.
https://kernel.ubuntu.com/~ikepanhc/lp1936771/

Revision history for this message
dann frazier (dannf) wrote :

Ah, I wasn't seeing the connection between DMA and MD RAID, but Kconfig clarifies:

menuconfig DMADEVICES
        bool "DMA Engine support"
        depends on HAS_DMA
        help
          DMA engines can do asynchronous data transfers without
          involving the host CPU. Currently, this framework can be
          used to offload memory copies in the network stack and
          RAID operations in the MD driver. This menu only presents
          DMA Device drivers supported by the configured arch, it may
          be empty in some cases.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

In kernel backtrace it says in md_thread->raid5d and ends in hisi_dma timeout. I will try to find out how raid uses dma engine and see if there is any function loss but performance downgrade with dmaengine off.

Ike Panhc (ikepanhc)
Changed in linux (Ubuntu Focal):
assignee: nobody → Ike Panhc (ikepanhc)
Changed in linux (Ubuntu Hirsute):
assignee: nobody → Ike Panhc (ikepanhc)
Changed in linux (Ubuntu Impish):
assignee: nobody → Ike Panhc (ikepanhc)
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
Changed in linux (Ubuntu Impish):
status: New → In Progress
Changed in kunpeng920:
status: Incomplete → In Progress
assignee: nobody → Ike Panhc (ikepanhc)
Ike Panhc (ikepanhc)
description: updated
Revision history for this message
Ike Panhc (ikepanhc) wrote :
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in kunpeng920:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
Revision history for this message
Ike Panhc (ikepanhc) wrote :

Thanks. 5.4.0-85.95 kernel works fine with soft RAID5 on kunpeng920

$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 62M 1 loop /snap/lxd/21032
loop1 7:1 0 48.9M 1 loop /snap/core18/2127
loop2 7:2 0 28.1M 1 loop /snap/snapd/12707
sda 8:0 0 557.9G 0 disk
├─sda1 8:1 0 512M 0 part /boot/efi
└─sda2 8:2 0 557.4G 0 part /
sdb 8:16 0 557.9G 0 disk
├─sdb1 8:17 0 100G 0 part
│ └─md0 9:0 0 199.9G 0 raid5
├─sdb2 8:18 0 100G 0 part
│ └─md0 9:0 0 199.9G 0 raid5
└─sdb3 8:19 0 100G 0 part
  └─md0 9:0 0 199.9G 0 raid5
$ uname -a
Linux saenger 5.4.0-85-generic #95-Ubuntu SMP Fri Sep 3 16:13:17 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
$ dmesg | tail -3
[ 531.883803] md0: detected capacity change from 0 to 214612049920
[ 531.883914] md: recovery of RAID array md0
[ 739.851754] hns3 0000:7d:00.0 enp125s0f0: link down

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Ike Panhc (ikepanhc) wrote :

Thanks. 5.11.0-35.37 kernel works fine with soft RAID5 on kunpeng920.

$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 57.4M 1 loop /snap/core20/1084
loop1 7:1 0 28.1M 1 loop /snap/snapd/12707
loop2 7:2 0 61M 1 loop /snap/lxd/21042
loop3 7:3 0 28.1M 1 loop /snap/snapd/12886
loop4 7:4 0 65.1M 1 loop /snap/lxd/21462
sda 8:0 0 894.3G 0 disk
├─sda1 8:1 0 100G 0 part
│ └─md127 9:127 0 199.9G 0 raid5
├─sda2 8:2 0 100G 0 part
│ └─md127 9:127 0 199.9G 0 raid5
└─sda3 8:3 0 100G 0 part
  └─md127 9:127 0 199.9G 0 raid5
sdb 8:16 0 894.3G 0 disk
├─sdb1 8:17 0 512M 0 part /boot/efi
└─sdb2 8:18 0 893.8G 0 part /
$ uname -a
Linux segers 5.11.0-35-generic #37-Ubuntu SMP Fri Sep 3 14:00:38 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
$ sudo dmesg | grep raid | tail -3
[ 21.269566] md/raid:md127: device sda2 operational as raid disk 1
[ 21.298316] md/raid:md127: device sda1 operational as raid disk 0
[ 21.305059] md/raid:md127: raid level 5 active with 2 out of 3 devices, algorithm 2

tags: added: verification-done-hirsute
removed: verification-needed-hirsute
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.13.0-16.16

---------------
linux (5.13.0-16.16) impish; urgency=medium

  * impish/linux: 5.13.0-16.16 -proposed tracker (LP: #1942611)

  * Miscellaneous Ubuntu changes
    - [Config] update toolchain in configs

  * Miscellaneous upstream changes
    - Revert "UBUNTU: [Config] Enable CONFIG_UBSAN_BOUNDS"

 -- Andrea Righi <email address hidden> Fri, 03 Sep 2021 16:21:14 +0200

Changed in linux (Ubuntu Impish):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (34.1 KiB)

This bug was fixed in the package linux - 5.4.0-88.99

---------------
linux (5.4.0-88.99) focal; urgency=medium

  * focal/linux: 5.4.0-88.99 -proposed tracker (LP: #1944747)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2021.09.06)

  * please drop virtualbox-guest-dkms virtualbox-guest-source (LP: #1933248)
    - Revert "UBUNTU: [Config] Disable virtualbox dkms build"

linux (5.4.0-87.98) focal; urgency=medium

  * please drop virtualbox-guest-dkms virtualbox-guest-source (LP: #1933248)
    - [Config] Disable virtualbox dkms build

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2021.09.06)

  * LRMv5: switch primary version handling to kernel-versions data set
    (LP: #1928921)
    - [Packaging] switch to kernel-versions

  * disable “CONFIG_HISI_DMA” config for ubuntu version (LP: #1936771)
    - Disable CONFIG_HISI_DMA
    - [Config] Record hisi_dma no longer built for arm64

  * memory leaking when removing a profile (LP: #1939915)
    - apparmor: Fix memory leak of profile proxy

  * CryptoExpress EP11 cards are going offline (LP: #1939618)
    - s390/zcrypt: Support for CCA protected key block version 2
    - s390: Replace zero-length array with flexible-array member
    - s390/zcrypt: Use scnprintf() for avoiding potential buffer overflow
    - s390/zcrypt: replace snprintf/sprintf with scnprintf
    - s390/ap: Remove ap device suspend and resume callbacks
    - s390/zcrypt: use fallthrough;
    - s390/zcrypt: use kvmalloc instead of kmalloc for 256k alloc
    - s390/ap: remove power management code from ap bus and drivers
    - s390/ap: introduce new ap function ap_get_qdev()
    - s390/zcrypt: use kzalloc
    - s390/zcrypt: fix smatch warnings
    - s390/zcrypt: code beautification and struct field renames
    - s390/zcrypt: split ioctl function into smaller code units
    - s390/ap: rename and clarify ap state machine related stuff
    - s390/zcrypt: provide cex4 cca sysfs attributes for cex3
    - s390/ap: rework crypto config info and default domain code
    - s390/zcrypt: simplify cca_findcard2 loop code
    - s390/zcrypt: remove set_fs() invocation in zcrypt device driver
    - s390/ap: remove unnecessary spin_lock_init()
    - s390/zcrypt: Support for CCA APKA master keys
    - s390/zcrypt: introduce msg tracking in zcrypt functions
    - s390/ap: split ap queue state machine state from device state
    - s390/ap: add error response code field for ap queue devices
    - s390/ap: add card/queue deconfig state
    - s390/sclp: Add support for SCLP AP adapter config/deconfig
    - s390/ap: Support AP card SCLP config and deconfig operations
    - s390/ap/zcrypt: revisit ap and zcrypt error handling
    - s390/zcrypt: move ap_msg param one level up the call chain
    - s390/zcrypt: Introduce Failure Injection feature
    - s390/zcrypt: fix wrong format specifications
    - s390/ap: fix ap devices reference counting
    - s390/zcrypt: return EIO when msg retry limit reached
    - s390/zcrypt: fix zcard and zqueue hot-unplug memleak
    - s390/ap: Fix hanging ioctl caused by wrong msg counter

  * memfd from ubuntu_kernel_s...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (68.6 KiB)

This bug was fixed in the package linux - 5.11.0-37.41

---------------
linux (5.11.0-37.41) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-37.41 -proposed tracker (LP: #1944180)

  * CVE-2021-41073
    - io_uring: ensure symmetry in handling iter types in loop_rw_iter()

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2021.09.06)

  * LRMv5: switch primary version handling to kernel-versions data set
    (LP: #1928921)
    - [Packaging] switch to kernel-versions

  * disable “CONFIG_HISI_DMA” config for ubuntu version (LP: #1936771)
    - Disable CONFIG_HISI_DMA
    - [Config] Record hisi_dma no longer built for arm64

  * ubunut_kernel_selftests: memory-hotplug: avoid spamming logs with
    dump_page() (LP: #1941829)
    - selftests: memory-hotplug: avoid spamming logs with dump_page(), ratio limit
      hot-remove error test

  * alsa: the soundwire audio doesn't work on the Dell TGL-H machines
    (LP: #1941669)
    - ASoC: SOF: allow soundwire use desc->default_fw_filename
    - ASoC: Intel: tgl: remove sof_fw_filename set for tgl_3_in_1_default

  * e1000e blocks the boot process when it tried to write checksum to its NVM
    (LP: #1936998)
    - e1000e: Do not take care about recovery NVM checksum

  * Dell XPS 17 (9710) PCI/internal sound card not detected (LP: #1935850)
    - ASoC: Intel: sof_sdw: include rt711.h for RT711 JD mode
    - ASoC: Intel: sof_sdw: add quirk for Dell XPS 9710

  * mute/micmute LEDs no function on HP ProBook 650 G8 (LP: #1939473)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 650 G8 Notebook PC

  * Fix mic noise on HP ProBook 445 G8 (LP: #1940610)
    - ALSA: hda/realtek: Limit mic boost on HP ProBook 445 G8

  * GPIO error logs in start and dmesg after update of kernel (LP: #1937897)
    - ODM: mfd: Check AAEON BFPI version before adding device

  * External displays not working on Thinkpad T490 with ThinkPad Thunderbolt 3
    Dock (LP: #1938999)
    - drm/i915/ilk-glk: Fix link training on links with LTTPRs

  * Fix kernel panic caused by legacy devices on AMD platforms (LP: #1936682)
    - SAUCE: iommu/amd: Keep swiotlb enabled to ensure devices with 32bit DMA
      still work

  * Hirsute update: upstream stable patchset 2021-08-30 (LP: #1942123)
    - drm/i915: Revert "drm/i915/gem: Asynchronous cmdparser"
    - Revert "drm/i915: Propagate errors on awaiting already signaled fences"
    - regulator: rtmv20: Fix wrong mask for strobe-polarity-high
    - regulator: rt5033: Fix n_voltages settings for BUCK and LDO
    - spi: stm32h7: fix full duplex irq handler handling
    - ASoC: tlv320aic31xx: fix reversed bclk/wclk master bits
    - r8152: Fix potential PM refcount imbalance
    - qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union()
    - ASoC: rt5682: Fix the issue of garbled recording after powerd_dbus_suspend
    - net: Fix zero-copy head len calculation.
    - ASoC: ti: j721e-evm: Fix unbalanced domain activity tracking during startup
    - ASoC: ti: j721e-evm: Check for not initialized parent_clk_id
    - efi/mokvar: Reserve the table only if it is in boot services data
    - nvme: fix nvme_setup_command ...

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Ike Panhc (ikepanhc)
Changed in kunpeng920:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.