hisi_sas driver may oops in prep_ssp_v3_hw()

Bug #1953386 reported by dann frazier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kunpeng920
Fix Released
Undecided
dann frazier
Ubuntu-18.04
Fix Released
Undecided
dann frazier
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
High
dann frazier

Bug Description

[Impact]
The hisi_sas driver occasionally oopses on boot.

[ 32.724666] Unable to handle kernel NULL pointer dereference at virtual address 00000110
[ 32.732720] Mem abort info:
[ 32.735504] ESR = 0x96000004
[ 32.738546] Exception class = DABT (current EL), IL = 32 bits
[ 32.744440] SET = 0, FnV = 0
[ 32.747482] EA = 0, S1PTW = 0
[ 32.750612] Data abort info:
[ 32.753478] ISV = 0, ISS = 0x00000004
[ 32.757298] CM = 0, WnR = 0
[ 32.760256] user pgtable: 4k pages, 48-bit VAs, pgd = (ptrval)
[ 32.766755] [0000000000000110] *pgd=0000000000000000
[ 32.771700] Internal error: Oops: 96000004 [#1] SMP
[ 32.776557] Modules linked in: realtek hibmc_drm aes_ce_blk aes_ce_cipher ttm crct10dif_ce ghash_ce drm_kms_helper ixgbe(+) syscopyarea sha2_ce sysfillrect sysimgblt fb_sys_fops ptp sha256_arm64 sha1_ce hns3 hisi_sas_v3_hw(+) hinic pps_core hisi_sas_main drm hclge mdio libsas ahci hnae3 scsi_transport_sas libahci gpio_dwapb hid_generic usbhid hid aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[ 32.811755] Process kworker/u256:1 (pid: 1280, stack limit = 0x (ptrval))
[ 32.819118] CPU: 66 PID: 1280 Comm: kworker/u256:1 Not tainted 4.15.18+ #24
[ 32.826047] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B160.01 01/15/2020
[ 32.834884] Workqueue: 0000:74:02.0_disco_q sas_discover_domain [libsas]
[ 32.839182] hns3 0000:bd:00.0 enp189s0f0: renamed from eth8
[ 32.841558] pstate: a0c00009 (NzCv daif +PAN +UAO)
[ 32.851878] pc : prep_ssp_v3_hw+0x64/0x340 [hisi_sas_v3_hw]
[ 32.857426] lr : hisi_sas_task_exec.constprop.0+0x304/0x640 [hisi_sas_main]
[ 32.864354] sp : ffff000021833a00
[ 32.867653] x29: ffff000021833a00 x28: ffffb790728621e0
[ 32.872940] x27: ffffb790728607d8 x26: ffffb79072861158
[ 32.878227] x25: ffffd7b07b9340a0 x24: 0000000000000028
[ 32.883515] x23: ffffd7906cd69400 x22: ffffd7906cd69418
[ 32.888802] x21: ffffb79072aad3d0 x20: ffff000021b33000
[ 32.894089] x19: ffffb79072aad3d0 x18: 0000000000000030
[ 32.899376] x17: 000000009e710776 x16: ffff3efb16aabb00
[ 32.904663] x15: ffffffffffffffff x14: ffff3efb976abcef
[ 32.909950] x13: 0000000000000006 x12: ffffb79072863480
[ 32.915237] x11: ffffb79072aad3e0 x10: ffffb79072861148
[ 32.920524] x9 : 0000000000000000 x8 : ffff000024cc0fb0
[ 32.925812] x7 : 0000000000000000 x6 : 000000000000003f
[ 32.931099] x5 : 0000000000000040 x4 : 00000000200000a0
[ 32.936386] x3 : ffffd79071e2d400 x2 : ffff000021833bb4
[ 32.941673] x1 : ffffb79072863460 x0 : 00000000280000a0
[ 32.946960] Call trace:
[ 32.949398] prep_ssp_v3_hw+0x64/0x340 [hisi_sas_v3_hw]
[ 32.954600] hisi_sas_task_exec.constprop.0+0x304/0x640 [hisi_sas_main]
[ 32.961184] hisi_sas_exec_internal_tmf_task+0xec/0x290 [hisi_sas_main]
[ 32.967767] hisi_sas_init_device+0x84/0x100 [hisi_sas_main]
[ 32.973401] hisi_sas_dev_found+0xa4/0x24c [hisi_sas_main]
[ 32.978864] sas_notify_lldd_dev_found+0x44/0xc0 [libsas]
[ 32.984239] sas_discover_end_dev+0x24/0x30 [libsas]
[ 32.989182] sas_ex_discover_devices+0x950/0xbfc [libsas]
[ 32.994557] sas_discover_root_expander+0x12c/0x150 [libsas]
[ 33.000192] sas_discover_domain+0x340/0x664 [libsas]
[ 33.005225] process_one_work+0x1bc/0x3ec
[ 33.009217] worker_thread+0x58/0x4a0
[ 33.012863] kthread+0x13c/0x170
[ 33.016077] ret_from_fork+0x10/0x18
[ 33.019638] Code: 2a004820 2a040000 f9400ed8 f9410061 (3943a319)
[ 33.025705] ---[ end trace da9256b7aa3297ba ]---

[Test Case]
Boot a hi1620-based server w/ root disk attached to hisi_sas v3 controller.

[Fix]
e1ba0b0b4451 scsi: hisi_sas: Fix to only call scsi_get_prot_op() for non-NULL scsi_cmnd

[Where things could go wrong]
We could potentially be trading one boot time crash for another that hasn't popped up in testing.

dann frazier (dannf)
Changed in linux (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Bionic):
status: New → In Progress
assignee: nobody → dann frazier (dannf)
Changed in kunpeng920:
assignee: nobody → dann frazier (dannf)
status: New → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
dann frazier (dannf)
Changed in kunpeng920:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/4.15.0-167.175 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
dann frazier (dannf) wrote :

Verification:

ubuntu@d06-4:~$ cat /proc/version
Linux version 4.15.0-167-generic (buildd@bos02-arm64-067) (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #175-Ubuntu SMP Wed Jan 5 01:58:16 UTC 2022
ubuntu@d06-4:~$ lsmod | grep hisi_sas
hisi_sas_v3_hw 40960 2
hisi_sas_main 40960 1 hisi_sas_v3_hw
libsas 81920 2 hisi_sas_v3_hw,hisi_sas_main
scsi_transport_sas 40960 4 hisi_sas_v3_hw,ses,hisi_sas_main,libsas

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (19.1 KiB)

This bug was fixed in the package linux - 4.15.0-167.175

---------------
linux (4.15.0-167.175) bionic; urgency=medium

  * bionic/linux: 4.15.0-167.175 -proposed tracker (LP: #1955276)

  * hisi_sas driver may oops in prep_ssp_v3_hw() (LP: #1953386)
    - scsi: hisi_sas: Fix to only call scsi_get_prot_op() for non-NULL scsi_cmnd

  * Bionic update: upstream stable patchset 2021-12-13 (LP: #1954703)
    - xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good
      delay
    - binder: use euid from cred instead of using task
    - Input: elantench - fix misreporting trackpoint coordinates
    - Input: i8042 - Add quirk for Fujitsu Lifebook T725
    - libata: fix read log timeout value
    - ocfs2: fix data corruption on truncate
    - mmc: dw_mmc: Dont wait for DRTO on Write RSP error
    - parisc: Fix ptrace check on syscall return
    - tpm: Check for integer overflow in tpm2_map_response_body()
    - media: ite-cir: IR receiver stop working after receive overflow
    - ALSA: ua101: fix division by zero at probe
    - ALSA: 6fire: fix control and bulk message timeouts
    - ALSA: line6: fix control and interrupt message timeouts
    - ALSA: synth: missing check for possible NULL after the call to kstrdup
    - ALSA: timer: Fix use-after-free problem
    - ALSA: timer: Unconditionally unlink slave instances, too
    - x86/irq: Ensure PI wakeup handler is unregistered before module unload
    - cavium: Return negative value when pci_alloc_irq_vectors() fails
    - scsi: qla2xxx: Fix unmap of already freed sgl
    - cavium: Fix return values of the probe function
    - sfc: Don't use netif_info before net_device setup
    - hyperv/vmbus: include linux/bitops.h
    - mmc: winbond: don't build on M68K
    - bpf: Prevent increasing bpf_jit_limit above max
    - xen/netfront: stop tx queues during live migration
    - spi: spl022: fix Microwire full duplex mode
    - watchdog: Fix OMAP watchdog early handling
    - vmxnet3: do not stop tx queues after netif_device_detach()
    - btrfs: fix lost error handling when replaying directory deletes
    - hwmon: (pmbus/lm25066) Add offset coefficients
    - regulator: s5m8767: do not use reset value as DVS voltage if GPIO DVS is
      disabled
    - regulator: dt-bindings: samsung,s5m8767: correct s5m8767,pmic-buck-default-
      dvs-idx property
    - EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell
    - mwifiex: fix division by zero in fw download path
    - ath6kl: fix division by zero in send path
    - ath6kl: fix control-message timeout
    - ath10k: fix control-message timeout
    - ath10k: fix division by zero in send path
    - PCI: Mark Atheros QCA6174 to avoid bus reset
    - rtl8187: fix control-message timeouts
    - evm: mark evm_fixmode as __ro_after_init
    - wcn36xx: Fix HT40 capability for 2Ghz band
    - mwifiex: Read a PCI register after writing the TX ring write pointer
    - libata: fix checking of DMA state
    - wcn36xx: handle connection loss indication
    - RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
    - signal: Remove the bogus sigkill_pending in ptrace_stop
    - signal/mips: Update (_save|_restore)_fp_context to fail with -EFAUL...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in kunpeng920:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.