Comment 7 for bug 1951289

Revision history for this message
dann frazier (dannf) wrote :

I came back to this and found that I now can get a failure w/ error messages when applying the fix (see comment #4) to bionic - see crash log below. So, I figured I could just bisect between v4.15 and v5.11 upstream w/ the fix applied and and figure out what other change(s) are required to avoid the crash. Unfortunately, I hit a kernel 5.0.0-rc5+ where the same build sometimes crashes (w/ the below backtrace) and sometimes boots fine. So it seems as though there maybe an underlying race. If that race is truly fixed in newer kernels, bisection will probably not be the best tool to find the fix since the failure case isn't 100% reproducible.

== bionic kernel w/ patch applied ==
[ 12.160242] CPU: All CPU(s) started at EL2
[ 12.165438] alternatives: patching kernel code
[ 12.186187] Unable to handle kernel paging request at virtual address 8dcaae1e1004
[ 12.194589] Mem abort info:
[ 12.197676] ESR = 0x96000004
[ 12.201055] Exception class = DABT (current EL), IL = 32 bits
[ 12.207619] SET = 0, FnV = 0
[ 12.210996] EA = 0, S1PTW = 0
[ 12.214471] Data abort info:
[ 12.217654] ISV = 0, ISS = 0x00000004
[ 12.221902] CM = 0, WnR = 0
[ 12.225186] [00008dcaae1e1004] user address but active_mm is swapper
[ 12.232238] Internal error: Oops: 96000004 [#1] SMP
[ 12.237644] Modules linked in:
[ 12.241026] Process swapper/0 (pid: 1, stack limit = 0x (ptrval))
[ 12.248459] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.15.18+ #1
[ 12.255216] pstate: 80800009 (Nzcv daif -PAN +UAO)
[ 12.260531] pc : build_sched_domains+0xb04/0xfd0
[ 12.265651] lr : build_sched_domains+0xae0/0xfd0
[ 12.270768] sp : ffff00000843bd20
[ 12.274434] x29: ffff00000843bd20 x28: ffffc5dfb98c0f80
[ 12.280320] x27: 00000000ffffffff x26: ffff3815115f2000
[ 12.286211] x25: 0000000000000100 x24: 0000000000000000
[ 12.292102] x23: ffff381511d69894 x22: ffffc5dfb9891600
[ 12.297988] x21: ffff381511d68e38 x20: ffffe5ffbb5fd200
[ 12.303880] x19: 0000000000000000 x18: ffffc5dfbfaec188
[ 12.309767] x17: 000000004cae2fed x16: 00000000804179ac
[ 12.315658] x15: 00000000bcf71eef x14: 0000000085f50aeb
[ 12.321546] x13: 0000000021ce98a4 x12: 00000000ffffff80
[ 12.327433] x11: ffff7f97feee5500 x10: 00000000fb44ed3c
[ 12.333319] x9 : 0000000000003b1b x8 : 0000000000000000
[ 12.339205] x7 : ffffc5dfbe007c00 x6 : 0000000000000002
[ 12.345098] x5 : ffffffffffffffff x4 : 0000000000000000
[ 12.350986] x3 : 0000000000000000 x2 : 00008dcaae1e1000
[ 12.356871] x1 : 0000000000000004 x0 : 0000000000000004
[ 12.362761] Call trace:
[ 12.365463] build_sched_domains+0xb04/0xfd0
[ 12.370196] sched_init_domains+0x88/0xb0
[ 12.374640] sched_init_smp+0x3c/0x90
[ 12.378696] kernel_init_freeable+0xf4/0x240
[ 12.383432] kernel_init+0x1c/0x114
[ 12.387294] ret_from_fork+0x10/0x18
[ 12.391254] Code: b4000201 93407e78 aa0103e0 f8787aa2 (f8626800)
[ 12.398067] ---[ end trace a7ac5adb59ec4af4 ]---
[ 12.403191] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 12.403191]

== kernel that sometimes boots OK w/ fix applied, sometimes doesn't ==
[ 11.975494] alternatives: patching kernel code
[ 11.985402] Unable to handle kernel paging request at virtual address 000067
44c1718004
[ 11.994200] Mem abort info:
[ 11.997287] ESR = 0x96000004
[ 12.000667] Exception class = DABT (current EL), IL = 32 bits
[ 12.007236] SET = 0, FnV = 0
[ 12.010617] EA = 0, S1PTW = 0
[ 12.014092] Data abort info:
[ 12.017278] ISV = 0, ISS = 0x00000004
[ 12.021528] CM = 0, WnR = 0
[ 12.024810] [00006744c1718004] user address but active_mm is swapper
[ 12.031859] Internal error: Oops: 96000004 [#1] SMP
[ 12.037266] Modules linked in:
[ 12.040648] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5+ #7
[ 12.047601] pstate: 80800009 (Nzcv daif -PAN +UAO)
[ 12.052917] pc : build_sched_domains+0x9f4/0x1138
[ 12.058133] lr : build_sched_domains+0x9d0/0x1138
[ 12.063342] sp : ffff00001043bcf0
[ 12.067011] x29: ffff00001043bcf0 x28: ffffb75d3ae21a00
[ 12.072900] x27: ffff50187e5dc730 x26: ffffb75d3a806e80
[ 12.078788] x25: ffff50187e5dd3a4 x24: ffffb75d3a8077a0
[ 12.084675] x23: 0000000000000000 x22: ffff50187e5dd3a4
[ 12.090561] x21: ffff50187e5dc730 x20: ffffd77cfb981400
[ 12.096452] x19: 0000000000000000 x18: 0000000000000014
[ 12.102342] x17: 00000000c60b0fdd x16: 00000000eb2df79d
[ 12.108231] x15: 000000001a6f88f6 x14: 00000000a5b719f8
[ 12.114122] x13: 00000000006ba184 x12: 000000004b281177
[ 12.120013] x11: ffff7f5df3eebf80 x10: 00000000cf4217a7
[ 12.125901] x9 : 0000000000003570 x8 : 0000000000210d00
[ 12.131791] x7 : ffffd77cfbaee580 x6 : 0000000000000002
[ 12.137680] x5 : ffffd77d7fe741c0 x4 : ffffffffffffffff
[ 12.143571] x3 : 0000000000000000 x2 : 00006744c1718000
[ 12.149460] x1 : 0000000000000004 x0 : 0000000000000004
[ 12.155352] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
[ 12.162785] Call trace:
[ 12.165490] build_sched_domains+0x9f4/0x1138
[ 12.170314] sched_init_domains+0x88/0xb0
[ 12.174760] sched_init_smp+0x3c/0x90
[ 12.178812] kernel_init_freeable+0x180/0x320
[ 12.183641] kernel_init+0x1c/0x110
[ 12.187502] ret_from_fork+0x10/0x18
[ 12.191460] Code: b4000201 93407e77 aa0103e0 f8777aa2 (f8626800)
[ 12.198259] ---[ end trace 90837fdb22e7ef78 ]---
[ 12.203390] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 12.211906] SMP: stopping secondary CPUs
[ 12.216276] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---