powerpc: opal machine checks lead to kernel oops and application SIGSEGV
Bug #1301424 reported by
Andy Whitcroft
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
High
|
Andy Whitcroft | ||
Trusty |
Fix Released
|
High
|
Andy Whitcroft |
Bug Description
We're suffering kernel Oopses on multiple Power machines running
Ubuntu 14.04 beta levels, with kernels ranging from (at least) 3.13.0-16
through 3.13.0-19. We are running Ubuntu directly on top of OPAL,
without KVM.
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
importance: | Undecided → High |
assignee: | nobody → Andy Whitcroft (apw) |
status: | Confirmed → In Progress |
Changed in linux (Ubuntu Trusty): | |
status: | In Progress → Fix Committed |
To post a comment you must log in.
After discussions the below was the recommendation:
"Below are the list of upstream commits that rewrites the machine check:
a68c33f powerpc: Fix endian issues in power7/8 machine check handler
30c8263 Move precessing of MCE queued event out from syscall exit path.
4e243b7 powerpc: Fix "attempt to move .org backwards" error
b63a0ff powerpc/powernv: Machine check exception handling.
28446de powerpc/powernv: Remove machine check handling in OPAL.
b5ff421 powerpc/book3s: Queue up and process delayed MCE events.
36df96f powerpc/book3s: Decode and save machine check event.
ae744f3 powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on
e22a227 powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on
0440705 powerpc/book3s: Add flush_tlb operation in cpu_spec.
4c70341 powerpc/book3s: Introduce a early machine check hook in cpu_spec.
1c51089 powerpc/book3s: Return from interrupt if coming from evil context.
1e9b450 powerpc/book3s: handle machine check in Linux host.
729b0f7 powerpc/book3s: Introduce exclusive emergency stack for machine check ex
b14a7253 powerpc/book3s: Split the common exception prolog logic into two sectio
Additional commits that are in linux-next:
ece980f powerpc/book3s: Fix CFAR clobbering issue in machine check handler.
55672ec powerpc/book3s: Recover from MC in sapphire on SCOM read via MMIO.
Beloe is the link to a critical fix that I posted to ppc-devel yesterday:
https:/ /lists. ozlabs. org/pipermail/ linuxppc- dev/2014- March/116447. html
There is one more critical fix in machine handling path on its way, which
I will post soon to ppc-devel after the tests. Will send you the link as
soon as I post that to the ppc-devel and the commit ids once available
upstream."