In ZZ-BML (POWER9):ubuntu17.04 installation Fails

Bug #1675771 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Tim Gardner
Zesty
Fix Released
Undecided
Tim Gardner

Bug Description

Ubuntu17.04 installation Fails on ZZ-BML (POWER9) as getting (rcu_sched detected stalls call traces )

Reproducible Step:
1- kick off the installation

while package installation rcu stalls detected and installation fails

Firmware version : FW910.00 (UL910_006)

LOG:

 ??????????????????????? Installing the base system ????????????????????????
  ? ?
  ? 32% ?
  ? ?
  ? [ 466.603008] Oops: Machine check, sig: 7 [#1] ?
[ 466.603069] SMP NR_CPUS=2048 ?
[ 466.603071] NUMA ?????????????????????????????????????????????????????????
[ 466.603108] Harmless Hypervisor Maintenance interrupt [Recovered]
[ 466.603111] Error detail: Processor Recovery done
[ 466.603113] HMER: 2040000000000000
[ 466.603117] Harmless Hypervisor Maintenance interrupt [Recovered]
[ 466.603119] Error detail: Processor Recovery done
[ 466.603121] HMER: 2040000000000000
[ 466.603123] Harmless Hypervisor Maintenance interrupt [Recovered]
[ 466.603125] Error detail: Processor Recovery done
[ 466.603128] HMER: 2040000000000000
[ 466.603909] PowerNV
[ 466.603963] Modules linked in: xfs jfs btrfs ntfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ipr scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac usb_storage tg3
[ 466.604408] CPU: 16 PID: 15340 Comm: debootstrap Tainted: G M 4.10.0-13-generic #15-Ubuntu
[ 466.604580] task: c0000000031bc800 task.stack: c00000000324c000
[ 466.604704] NIP: c000000000079aa4 LR: c0000000006124c4 CTR: c00000000034a980
[ 466.604851] REGS: c00000000fe7bd80 TRAP: 0200 Tainted: G M (4.10.0-13-generic)
[ 466.605020] MSR: 9000000000209033 <SF,HV,EE,ME,IR,DR,RI,LE>
[ 466.605043] CR: 88002881 XER: 20000000
[ 466.605214] CFAR: c000000000079a8c DAR: 00000000525984f0 DSISR: 00000400 SOFTE: 1
[ 466.605214] GPR00: c0000000006124b0 c00000000324fbe0 c00000000143c700 c0000003adf3ea8c
[ 466.605214] GPR04: 00000000525984f0 0000000000000001 c00000000324fd20 ffffffffffffffff
[ 466.605214] GPR08: c000000000000000 c000000000b60000 c000000000000000 c000000000b61060
[ 466.605214] GPR12: c00000000034a980 c00000000fb89000 0000000000000000 0000000000000000
[ 466.605214] GPR16: 0000000000000000 00003ffffd8246d8 000000004b7de900 00003ffffd83fefc
[ 466.605214] GPR20: 000000004b7de8c0 000000004b7e2528 0000000000000002 00000000525984f0
[ 466.605214] GPR24: c00000000324fd70 000000000000ea8c 0000000000000000 0000000000000001
[ 466.605214] GPR28: c00a000000eb7cc0 c0000003adf3ea8c 0000000000000001 c00000000324fd20
[ 466.607454] NIP [c000000000079aa4] __copy_tofrom_user_power7+0x250/0x7cc
[ 466.607687] LR [c0000000006124c4] copy_page_from_iter+0xe4/0x2e0
[ 466.607911] Call Trace:
[ 466.608010] [c00000000324fbe0] [c0000000006124b0] copy_page_from_iter+0xd0/0x2e0 (unreliable)
[ 466.608327] [c00000000324fc50] [c00000000034bca4] pipe_write+0x514/0x560
[ 466.608556] [c00000000324fd00] [c00000000033c8ec] new_sync_write+0xec/0x150
[ 466.608786] [c00000000324fd90] [c00000000033e414] vfs_write+0xd4/0x240
[ 466.609017] [c00000000324fde0] [c00000000033ffc8] SyS_write+0x68/0x110
[ 466.609250] [c00000000324fe30] [c00000000000b184] system_call+0x38/0xe0
[ 466.609476] Instruction dump:
[ 466.609614] 38630008 409d0014 80040000 38840004 90030000 38630004 409e0014 a0040000
[ 466.609894] 38840002 b0030000 38630002 409f000c <88040000> 98030000 38600000 4e800020
[ 466.610181] ---[ end trace cea5171b162d1d13 ]---
[ 466.610372]
[ 487.618915] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 487.619114] 16-...: (1 GPs behind) idle=093/140000000000000/0 softirq=2896/2940 fqs=2626
[ 487.619259] (detected by 14, t=5252 jiffies, g=17122, c=17121, q=3265)
[ 487.619393] Task dump for CPU 16:
[ 487.619468] debootstrap R running task 0 15340 8860 0x00042004
[ 487.619617] Call Trace:
[ 487.619677] [c00000000324faf0] [c00000000324fdd0] 0xc00000000324fdd0 (unreliable)
[ 550.642918] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 550.643121] 16-...: (1 GPs behind) idle=093/140000000000000/0 softirq=2896/2940 fqs=10504
[ 550.643269] (detected by 26, t=21008 jiffies, g=17122, c=17121, q=6735)
[ 550.643404] Task dump for CPU 16:
[ 550.643481] debootstrap R running task 0 15340 8860 0x00042004
[ 550.643635] Call Trace:
[ 550.643694] [c00000000324faf0] [c00000000324fdd0] 0xc00000000324fdd0 (unreliable)
[ 613.666915] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 613.667029] 16-...: (1 GPs behind) idle=093/140000000000000/0 softirq=2896/2940 fqs=18383
[ 613.667099] (detected by 8, t=36764 jiffies, g=17122, c=17121, q=10254)
[ 613.667229] Task dump for CPU 16:
[ 613.667303] debootstrap R running task 0 15340 8860 0x00042004
[ 613.667452] Call Trace:
[ 613.667509] [c00000000324faf0] [c00000000324fdd0] 0xc00000000324fdd0 (unreliable)
[ 676.690913] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 676.691106] 16-...: (1 GPs behind) idle=093/140000000000000/0 softirq=2896/2940 fqs=26262
[ 676.691252] (detected by 5, t=52520 jiffies, g=17122, c=17121, q=13570)
[ 676.691383] Task dump for CPU 16:
[ 676.691457] debootstrap R running task 0 15340 8860 0x00042004
[ 676.691606] Call Trace:

== Comment: #5 - Breno Henrique Leitao <email address hidden> - 2017-03-24 08:09:57 ==
We need to have these patches included in 17.04:

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=1363875bdb6317a2d0798284d7aaf320f0782f6d

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=c1bbf387d6191e6e18f3adc4db45b922822c2ba4

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=7b9f71f974a12740e79e918cfd58c2fce0b5b580

CVE References

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-152883 severity-critical targetmilestone-inin1704
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Michael Hohnbaum (hohnbaum) wrote : Re: [Bug 1675771] [NEW] In ZZ-BML (POWER9):ubuntu17.04 installation Fails
Download full text (6.6 KiB)

Leann,

More kernel patches supporting upcoming hardware.

                  Michael

On 03/24/2017 06:19 AM, Launchpad Bug Tracker wrote:
> bugproxy (bugproxy) has assigned this bug to you for Ubuntu:
>
> Ubuntu17.04 installation Fails on ZZ-BML (POWER9) as getting (rcu_sched
> detected stalls call traces )
>
> Reproducible Step:
> 1- kick off the installation
>
> while package installation rcu stalls detected and installation fails
>
> Firmware version : FW910.00 (UL910_006)
>
> LOG:
>
> ??????????????????????? Installing the base system ????????????????????????
> ? ?
> ? 32% ?
> ? ?
> ? [ 466.603008] Oops: Machine check, sig: 7 [#1] ?
> [ 466.603069] SMP NR_CPUS=2048 ?
> [ 466.603071] NUMA ?????????????????????????????????????????????????????????
> [ 466.603108] Harmless Hypervisor Maintenance interrupt [Recovered]
> [ 466.603111] Error detail: Processor Recovery done
> [ 466.603113] HMER: 2040000000000000
> [ 466.603117] Harmless Hypervisor Maintenance interrupt [Recovered]
> [ 466.603119] Error detail: Processor Recovery done
> [ 466.603121] HMER: 2040000000000000
> [ 466.603123] Harmless Hypervisor Maintenance interrupt [Recovered]
> [ 466.603125] Error detail: Processor Recovery done
> [ 466.603128] HMER: 2040000000000000
> [ 466.603909] PowerNV
> [ 466.603963] Modules linked in: xfs jfs btrfs ntfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ipr scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac usb_storage tg3
> [ 466.604408] CPU: 16 PID: 15340 Comm: debootstrap Tainted: G M 4.10.0-13-generic #15-Ubuntu
> [ 466.604580] task: c0000000031bc800 task.stack: c00000000324c000
> [ 466.604704] NIP: c000000000079aa4 LR: c0000000006124c4 CTR: c00000000034a980
> [ 466.604851] REGS: c00000000fe7bd80 TRAP: 0200 Tainted: G M (4.10.0-13-generic)
> [ 466.605020] MSR: 9000000000209033 <SF,HV,EE,ME,IR,DR,RI,LE>
> [ 466.605043] CR: 88002881 XER: 20000000
> [ 466.605214] CFAR: c000000000079a8c DAR: 00000000525984f0 DSISR: 00000400 SOFTE: 1
> [ 466.605214] GPR00: c0000000006124b0 c00000000324fbe0 c00000000143c700 c0000003adf3ea8c
> [ 466.605214] GPR04: 00000000525984f0 0000000000000001 c00000000324fd20 ffffffffffffffff
> [ 466.605214] GPR08: c000000000000000 c000000000b60000 c000000000000000 c000000000b61060
> [ 466.605214] GPR12: c00000000034a980 c00000000fb89000 0000000000000000 0000000000000000
> [ 466.605214] GPR16: 0000000000000000 00003ffffd8246d8 000000004b7de900 00003ffffd83fefc
> [ 466.605214] GPR20: 000000004b7de8c0 000000004b7e2528 0000000000000002 00000000525984f0
> [ 466.605214] GPR24: c00000000324fd70 000000000000ea8c 0000000000000000 0000000000000001
> [ 466.605214] GPR28: c00a000000eb7cc0 c0000003adf3ea8c 0000000000000001 c00000000324fd20
> [ 466.607454] NIP [c000000000079aa4] __copy_tofrom_user...

Read more...

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Zesty):
assignee: Taco Screen team (taco-screen-team) → Tim Gardner (timg-tpi)
status: New → Fix Committed
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-03-24 14:57 EDT-------
*** Bug 152890 has been marked as a duplicate of this bug. ***

bugproxy (bugproxy)
tags: removed: bugnameltc-152883 severity-critical
bugproxy (bugproxy)
tags: added: bugnameltc-152883 severity-critical
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (9.0 KiB)

This bug was fixed in the package linux - 4.10.0-15.17

---------------
linux (4.10.0-15.17) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1675868

  * In ZZ-BML (POWER9):ubuntu17.04 installation Fails (LP: #1675771)
    - powerpc/64s: fix handling of non-synchronous machine checks
    - powerpc/64s: allow machine check handler to set severity and initiator
    - powerpc/64s: POWER9 machine check handler

  * [Feature] R3 mwait support for Knights Mill (LP: #1637550)
    - x86/cpufeature: Enable RING3MWAIT for Knights Landing
    - x86/cpufeature: Enable RING3MWAIT for Knights Mill
    - x86/msr: Add MSR_MISC_FEATURE_ENABLES and RING3MWAIT bit
    - x86/elf: Add HWCAP2 to expose ring 3 MONITOR/MWAIT
    - x86/cpufeature: Add RING3MWAIT to CPU features

  * [Feature] GLK:New device IDs (LP: #1645951)
    - mfd: intel-lpss: Add Intel Gemini Lake PCI IDs
    - pwm: lpss: Add Intel Gemini Lake PCI ID
    - i2c: i801: Add support for Intel Gemini Lake
    - spi: pxa2xx: Add support for Intel Gemini Lake
    - [Config] CONFIG_PINCTRL_GEMINILAKE=m
    - pinctrl: intel: Add Intel Gemini Lake pin controller support

  * Zesty update to v4.10.5 stable release (LP: #1675032)
    - net/mlx5e: Register/unregister vport representors on interface attach/detach
    - net/mlx5e: Do not reduce LRO WQE size when not using build_skb
    - net/mlx5e: Fix broken CQE compression initialization
    - net/mlx5e: Update MPWQE stride size when modifying CQE compress state
    - net/mlx5e: Fix wrong CQE decompression
    - vxlan: correctly validate VXLAN ID against VXLAN_N_VID
    - vti6: return GRE_KEY for vti6
    - vxlan: don't allow overwrite of config src addr
    - ipv4: add missing initialization for flowi4_uid
    - ipv4: mask tos for input route
    - sctp: set sin_port for addr param when checking duplicate address
    - net sched actions: decrement module reference count after table flush.
    - l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv
    - vxlan: lock RCU on TX path
    - geneve: lock RCU on TX path
    - mlxsw: spectrum_router: Avoid potential packets loss
    - net: bridge: allow IPv6 when multicast flood is disabled
    - net: don't call strlen() on the user buffer in packet_bind_spkt()
    - net: net_enable_timestamp() can be called from irq contexts
    - ipv6: orphan skbs in reassembly unit
    - dccp: Unlock sock before calling sk_free()
    - amd-xgbe: Stop the PHY before releasing interrupts
    - amd-xgbe: Be sure to set MDIO modes on device (re)start
    - amd-xgbe: Don't overwrite SFP PHY mod_absent settings
    - bonding: use ETH_MAX_MTU as max mtu
    - strparser: destroy workqueue on module exit
    - tcp: fix various issues for sockets morphing to listen state
    - net: fix socket refcounting in skb_complete_wifi_ack()
    - net: fix socket refcounting in skb_complete_tx_timestamp()
    - net/sched: act_skbmod: remove unneeded rcu_read_unlock in tcf_skbmod_dump
    - dccp: fix use-after-free in dccp_feat_activate_values
    - team: use ETH_MAX_MTU as max mtu
    - vrf: Fix use-after-free in vrf_xmit
    - net/tunnel: set inner protocol in network gro hooks
    - uapi: fix linux/packet_diag.h use...

Read more...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
bugproxy (bugproxy)
tags: added: targetmilestone-inin16043
removed: targetmilestone-inin1704
Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-12-10 04:13 EDT-------
Rejecting due to lack of response from tester.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.