Kernel hangs during msm init

Bug #1841911 reported by Paolo Pisati
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-snapdragon (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Fix Released
Critical
Unassigned

Bug Description

Impact:

Ubuntu-snapdragon-4.15.0-1061.68 hangs during boot around msm init.
Sometimes we get the following stack trace, or the boot completes and the board hangs during reboot:

...
[ 8.113018] msm_dsi_manager_register: failed to register mipi dsi host for DSI 0
[ 8.131081] msm 1a00000.mdss: failed to bind 1a98000.dsi (ops dsi_ops [msm]): -517
[ 8.138234] msm 1a00000.mdss: master bind failed: -517
[ 8.145551] platform 1a01000.mdp: Dropping the link to 1ef0000.iommu
[ 8.150545] iommu: Removing device 1a01000.mdp from group 1
[ 8.157051] ------------[ cut here ]------------
[ 8.162369] WARNING: CPU: 1 PID: 1316 at /build/linux-snapdragon-t5G9R3/linux-snapdragon-4.15.0/drivers/iommu/qcom_iommu.c:336 qcom_iommu_domain_free
+0x74/0x88
[ 8.167166] Modules linked in: adv7511_drm cec rc_core msm(+) mdt_loader
[ 8.181137] CPU: 1 PID: 1316 Comm: systemd-udevd Not tainted 4.15.0-1061-snapdragon #68-Ubuntu
[ 8.188079] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 8.196501] pstate: 60400005 (nZCv daif +PAN -UAO)
[ 8.203356] pc : qcom_iommu_domain_free+0x74/0x88
[ 8.207955] lr : qcom_iommu_domain_free+0x74/0x88
[ 8.212727] sp : ffff00000cbeb680
[ 8.217412] x29: ffff00000cbeb680 x28: ffff8000396d84b8
[ 8.220713] x27: ffff8000396d84b0 x26: ffff8000396d84c0
[ 8.226096] x25: ffff80003d057c10 x24: ffff8000396d8420
[ 8.231391] x23: 0000000000000003 x22: ffff80003ce40258
[ 8.236686] x21: ffff80000203ad00 x20: ffff80000203af30
[ 8.241981] x19: ffff80000203af00 x18: ffffffffffffffff
[ 8.247275] x17: 0000000000000000 x16: 0000000000000004
[ 8.252570] x15: ffff000009549c08 x14: 0720072007200720
[ 8.257866] x13: 0720072007200720 x12: 0720072007200720
[ 8.263161] x11: ffff000009549e80 x10: ffff00000871d340
[ 8.268456] x9 : 0720072007200720 x8 : 0000000000000005
[ 8.273751] x7 : 0720072d072d072d x6 : 000000000000014c
[ 8.279046] x5 : ffff000008610250 x4 : 0000000000000000
[ 8.284345] x3 : 0000000000000000 x2 : a59fa8ece8469a00
[ 8.289637] x1 : 0000000000000000 x0 : 0000000000000024
[ 8.294932] Call trace:
[ 8.300227] qcom_iommu_domain_free+0x74/0x88
[ 8.302400] iommu_group_release+0x54/0x90
[ 8.306914] kobject_put+0x8c/0x1f0
[ 8.310905] kobject_del.part.0+0x3c/0x50
[ 8.314292] kobject_put+0x74/0x1f0
[ 8.318455] iommu_group_remove_device+0x10c/0x198
[ 8.321756] qcom_iommu_remove_device+0x58/0x70
[ 8.326617] iommu_bus_notifier+0xa8/0x120
[ 8.331045] notifier_call_chain+0x5c/0xa0
[ 8.335210] blocking_notifier_call_chain+0x64/0x88
[ 8.339294] device_del+0x234/0x368
[ 8.344066] platform_device_del.part.3+0x2c/0x98
[ 8.347539] platform_device_unregister+0x24/0x38
[ 8.352410] of_platform_device_destroy+0xb8/0xc0
[ 8.357087] device_for_each_child+0x58/0xb0
[ 8.361775] of_platform_depopulate+0x4c/0x68
[ 8.366350] msm_pdev_probe+0x2c4/0x388 [msm]
[ 8.370369] platform_drv_probe+0x60/0xc0
[ 8.374707] driver_probe_device+0x2ec/0x458
[ 8.378701] __driver_attach+0xdc/0x128
[ 8.383042] bus_for_each_dev+0x78/0xd8
[ 8.386598] driver_attach+0x30/0x40
[ 8.390418] bus_add_driver+0x20c/0x2a8
[ 8.394237] driver_register+0x6c/0x110
[ 8.397797] __platform_driver_register+0x54/0x60
[ 8.401841] msm_drm_register+0x54/0x80 [msm]
[ 8.406481] do_one_initcall+0x58/0x160
[ 8.410818] do_init_module+0x64/0x1d8
[ 8.414463] load_module+0x1378/0x15c8
[ 8.418282] SyS_finit_module+0x100/0x118
[ 8.422016] el0_svc_naked+0x30/0x34
[ 8.426095] ---[ end trace 800d0885aa276bfd ]---

Fix:

During the Ubuntu-snapdragon-4.15.0-1061.68 cycle, we picked up one upstream patch that of_platform_depopulate() msm in case of probe deferral (or during the removal), but that patch triggers a WARN_ON() during the wind down of the IOMMU (and the susequent kernel hang) - unless we want to backport the new msm dri driver (and all the relevant dependencies), revert the stable patch that calls of_platform_depopulate().

How to test:

Boot a patched kernel and check if that stracktrace shows up again.

Regression:

None, i'm reverting a patch that wasn't there before and clearly wasn't tested with our downstream BSP.

CVE References

Paolo Pisati (p-pisati)
description: updated
description: updated
Paolo Pisati (p-pisati)
summary: - Kernel crash during msm init (or during shutdown)
+ Kernel hangs during msm init
Changed in linux-snapdragon (Ubuntu):
status: New → Invalid
Changed in linux-snapdragon (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → Critical
Revision history for this message
Khaled El Mously (kmously) wrote :

@Paolo: I'm just wondering if

a) There's some testing improvement we can do to catch this kind of thing pre-release
and
b) why that drm driver is getting this fix via stable updates but the driver itself would require "backport(ing) of the new msm dri driver (and all the relevant dependencies)".. Shouldn't that driver already be in our bionic tree if it's getting this fix via stable updates? (Although I just realized as I'm writing this that bionic gets updates from 4.16 too so maybe that's why. Still easier to just ask :) )

Changed in linux-snapdragon (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Naresh Kamboju (naresh-kamboju) wrote :
Download full text (4.9 KiB)

Linaro's test farm also notice this problem and reported internal bug on dragonboard-410c device running 5.3.0-rc6 mainline kernel.

Bug 5460 - mainline: dragonboard-410c: WARNING at qcom_iommu.c:325 qcom_iommu_domain_free+0x74/0x88
https://bugs.linaro.org/show_bug.cgi?id=5460

dragonboard-410c booting mainline kernel caused this kernel warning.

[ 9.795563] msm_dsi_manager_register: failed to register mipi dsi host for DSI 0
[ 9.802295] msm 1a00000.mdss: failed to bind 1a98000.dsi (ops dsi_ops [msm]): -517
[ 9.803459] msm 1a00000.mdss: master bind failed: -517
[ 9.812313] platform 1a01000.mdp: Removing from iommu group 1
[ 9.816418] ------------[ cut here ]------------
[ 9.822824] WARNING: CPU: 2 PID: 247 at /usr/src/kernel/drivers/iommu/qcom_iommu.c:325 qcom_iommu_domain_free+0x74/0x88
[ 9.825039] Modules linked in: adv7511(+) cec msm(+) mdt_loader drm_kms_helper drm drm_panel_orientation_quirks qrtr fuse
[ 9.835546] CPU: 2 PID: 247 Comm: systemd-udevd Not tainted 5.3.0-rc6 #1
[ 9.847025] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 9.847047] pstate: 40000005 (nZcv daif -PAN -UAO)
[ 9.847062] pc : qcom_iommu_domain_free+0x74/0x88
[ 9.847072] lr : qcom_iommu_domain_free+0x74/0x88
[ 9.847078] sp : ffff000012fcb5a0
[[ 9.874161] x29: ffff000012fcb5a0 x28: 0000000000000000
[ 9.877635] x27: ffff00000923ef10 x26: ffff00001157ae08
[ 9.882930] x25: ffff8000352a99a0 x24: ffff8000352a9998
[ 9.888225] x23: ffff000011f800a0 x22: ffff000012212f20
[ 9.893519] x21: ffff80003458ba80 x20: ffff8000019de6c0
[ 9.898814] x19: ffff8000019de600 x18: ffffffffffffffff
[ 9.904110] x17: 0000000000000000 x16: 0000000000000000
[ 9.909405] x15: ffff000011f7f848 x14: 00313d4e5f454c42
[ 9.914700] x13: 0000000000000040 x12: 0000000000000228
[ 9.919994] x11: ffff000012258000 x10: 0000000000000050
[ 9.925289] x9 : 0000000000000000 x8 : ffff000011f7f848
[ 9.930586] x7 : 0000000079bb0468 x6 : ffff80003fc80400
[ 9.935880] x5 : ffff80003fc80400 x4 : ffff800038104500
[ 9.941175] x3 : ffff000011f80000 x2 : 01212b45321d0900
[ 9.946470] x1 : 0000000000000000 x0 : 0000000000000024
[ 9.951767] Call trace:
[0;32m OK [0m] Found device /dev/ttyMSM0[ 9.957064] qcom_iommu_domain_free+0x74/0x88
.
[ 9.968809] kobject_del+0x50/0x68
[ 9.968865] kobject_put+0xd8/0xf8
[ 9.971383] iommu_group_remove_device+0x14c/0x298
[ 9.974769] qcom_iommu_remove_device+0x58/0x70
[ 9.979545] iommu_release_device+0x34/0x50
[ 9.983970] iommu_bus_notifier+0xf0/0x110
[ 9.988135] notifier_call_chain+0x5c/0xa0
[ 9.992305] blocking_notifier_call_chain+0x68/0x88
[ 9.996384] device_del+0x248/0x378
[ 10.001157] platform_device_del.part.3+0x20/0x98
[ 10.004631] platform_device_unregister+0x2c/0x40
[ 10.009492] of_platform_device_destroy+0xc0/0xc8
[ 10.014180] device_for_each_child+0x68/0xb0
[ 10.018867] of_platform_depopulate+0x4c/0x68
[ 10.023356] msm_pdev_probe+0x1d8/0x358 [msm]
[ 10.027466] platform_drv_probe+0x58/0xa8
[ 10.031801] really_probe+0xd8/0x2b0
[ 10.035794] driver_probe_device+0x5c/0x108
[ 10.039439] device_driver_attach...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (236.6 KiB)

This bug was fixed in the package linux-snapdragon - 4.15.0-1062.69

---------------
linux-snapdragon (4.15.0-1062.69) bionic; urgency=medium

  * bionic/linux-snapdragon: 4.15.0-1062.69 -proposed tracker (LP: #1842713)

  * Kernel hangs during msm init (LP: #1841911)
    - Revert "drm/msm: Depopulate platform on probe failure"

  * Dragonboard fails to boot: hangs after SMMU init (LP: #1841893)
    - Revert "iommu/arm-smmu: Add support for qcom, smmu-v2 variant"
    - iommu/arm-smmu: Add support for qcom,smmu-v2 variant

  [ Ubuntu: 4.15.0-62.69 ]

  * bionic/linux: 4.15.0-62.69 -proposed tracker (LP: #1842746)
  * Kernel Panic with linux-image-4.15.0-60-generic when specifying nameserver
    in docker-compose (LP: #1842447)
    - ip: frags: fix crash in ip_do_fragment()

  [ Ubuntu: 4.15.0-60.67 ]

  * bionic/linux: 4.15.0-60.67 -proposed tracker (LP: #1841086)
  * [Regression] net test from ubuntu_kernel_selftests failed due to bpf test
    compilation issue (LP: #1840935)
    - SAUCE: Fix "bpf: relax verifier restriction on BPF_MOV | BPF_ALU"
  * [Regression] failed to compile seccomp test from ubuntu_kernel_selftests
    (LP: #1840932)
    - Revert "selftests: skip seccomp get_metadata test if not real root"
  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis

linux-snapdragon (4.15.0-1061.68) bionic; urgency=medium

  * bionic/linux-snapdragon: 4.15.0-1061.68 -proposed tracker (LP: #1839979)

  * Bionic update: upstream stable patchset 2019-07-25 (LP: #1837952)
    - [Config] snapdragon: updateconfigs for CONFIG_SUN50I_ERRATUM_UNKNOWN1

  * Bionic update: upstream stable patchset 2019-08-02 (LP: #1838824)
    - [Config] snapdragon: updateconfigs for CONFIG_NOUVEAU_LEGACY_CTX_SUPPORT

  * Bionic update: upstream stable patchset 2019-07-26 (LP: #1838116)
    - [Config] snapdragon: updateconfigs for CONFIG_LDISC_AUTOLOAD
    - [Config] snapdragon: updateconfigs for CONFIG_R3964 (BROKEN)

  [ Ubuntu: 4.15.0-59.66 ]

  * bionic/linux: 4.15.0-59.66 -proposed tracker (LP: #1840006)
  * zfs not completely removed from bionic tree (LP: #1840051)
    - SAUCE: (noup) remove completely the zfs code
  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
  * [18.04 FEAT] Enhanced hardware support (LP: #1836857)
    - s390: report new CPU capabilities
    - s390: add alignment hints to vector load and store
  * [18.04 FEAT] Enhanced CPU-MF hardware counters - kernel part (LP: #1836860)
    - s390/cpum_cf: Add support for CPU-MF SVN 6
    - s390/cpumf: Add extended counter set definitions for model 8561 and 8562
  * ideapad_laptop disables WiFi/BT radios on Lenovo Y540 (LP: #1837136)
    - platform/x86: ideapad-laptop: Remove no_hw_rfkill_list
  * Stacked onexec transitions fail when under NO NEW PRIVS restrictions
    (LP: #1839037)
    - SAUCE: apparmor: fix nnp subset check failure when, stacking
  * bcache: bch_allocator_thread(): hung task timeout (LP: #1784665) // Tight
    timeout for bcache removal causes spurious failures (LP: #1796292)
    - SAUCE: bcache: fix deadlock in bcache_allocator
  * bcache: bch_allocator_thread(): hung task timeout (LP: #1784665)
    - bcache: never writeback a discard operation...

Changed in linux-snapdragon (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.