armadaXP: Unable to handle kernel NULL pointer dereference at virtual address 00000000 on shutdown

Bug #1090591 reported by C de-Avillez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-armadaxp (Ubuntu)
Invalid
Undecided
Unassigned
Quantal
Fix Released
Undecided
Ike Panhc

Bug Description

During regression testing:

[ OK ] * Unmounting local filesystems...
* Will now restart
[ 132.445491] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 132.446801] CPU2: shutdown
[ 132.447137] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 132.447144] pgd = c0004000
[ 132.447156] Internal error: Oops: 815 [#1] SMP ARM
[ 132.447161] Modules linked in:
[ 132.447170] CPU: 2 Not tainted (3.5.0-1605-armadaxp #7-Ubuntu)
[ 132.447182] PC is at fork_init+0x88/0x98
[ 132.447189] LR is at platform_cpu_die+0x24/0x6c
[ 132.447196] pc : [<c08b0588>] lr : [<c0654420>] psr: 00000093
[ 132.447196] sp : ef063f90 ip : ef063fa8 fp : ef063fa4
[ 132.447199] r10: 00000000 r9 : b30a208e r8 : 225a4670
[ 132.447203] r7 : ffffffff r6 : ef062000 r5 : c093b098 r4 : 00000002
[ 132.447207] r3 : 00000004 r2 : 00000005 r1 : e0ffe0f9 r0 : 00000000
[ 132.447212] Flags: nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 132.447216] Control: 10c53c7d Table: 0000406a DAC: 00000015
[ 132.447221] Process swapper/2 (pid: 0, stack limit = 0xef0622f0)
[ 132.447224] Stack: (0xef063f90 to 0xef064000)
[ 132.447231] 3f80: ef062000 00000002 ef063fbc ef063fa8
[ 132.447240] 3fa0: c06543d8 c0654408 ef062000 c0679fa8 ef063fd4 ef063fc0 c000ede4 c06543ac
[ 132.447248] 3fc0: c090ddf8 00000002 ef063ff4 ef063fd8 c06648ac c000ed94 2f05406a 00000015
[ 132.447257] 3fe0: 10c03c7d c093b78c 00000000 ef063ff8 00664294 c06647b0 eff1cfef f8f9f74f
[ 132.447259] Backtrace:
[ 132.447269] [<c06543fc>] (platform_cpu_die+0x0/0x6c) from [<c06543d8>] (cpu_die+0x38/0x5c)
[ 132.447273] r5:00000002 r4:ef062000
[ 132.447289] [<c06543a0>] (cpu_die+0x0/0x5c) from [<c000ede4>] (cpu_idle+0x5c/0xe8)
[ 132.447292] r5:c0679fa8 r4:ef062000
[ 132.447311] [<c000ed88>] (cpu_idle+0x0/0xe8) from [<c06648ac>] (secondary_start_kernel+0x108/0x12c)
[ 132.447314] r5:00000002 r4:c090ddf8
[ 132.447327] [<c06647a4>] (secondary_start_kernel+0x0/0x12c) from [<00664294>] (0x664294)
[ 132.447330] r7:c093b78c r6:10c03c7d r5:00000015 r4:2f05406a
[ 132.447347] Code: 00000000 00000000 000041ed 00001000 (50ca74b5)
[ 132.447354] ---[ end trace 43c2f171cb9d5246 ]---
[ 132.447360] Kernel panic - not syncing: Attempted to kill the idle task!
[ 132.447146] [00000000] *pgd=00000000
[ 132.645257] pgd = c0004000
[ 132.651551] [00000000] *pgd=00000000
[ 132.655147] Internal error: Oops: 815 [#2] SMP ARM
[ 132.659948] Modules linked in:
[ 132.663022] CPU: 1 Tainted: G D (3.5.0-1605-armadaxp #7-Ubuntu)
[ 132.670178] PC is at fork_init+0x88/0x98
[ 132.674112] LR is at platform_cpu_die+0x24/0x6c
[ 132.678656] pc : [<c08b0588>] lr : [<c0654420>] psr: 00000093
[ 132.678656] sp : ef05ff90 ip : ef05ffa8 fp : ef05ffa4
[ 132.690163] r10: 00000000 r9 : b30a608e r8 : 225a8670
[ 132.695401] r7 : ffffffff r6 : ef05e000 r5 : c093b098 r4 : 00000001
[ 132.701945] r3 : 00000004 r2 : 00000005 r1 : e0ffe0f9 r0 : 00000000
[ 132.708489] Flags: nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 132.715904] Control: 10c53c7d Table: 0000406a DAC: 00000015
[ 132.721665] Process swapper/1 (pid: 0, stack limit = 0xef05e2f0)
[ 132.727686] Stack: (0xef05ff90 to 0xef060000)
[ 132.732057] ff80: ef05e000 00000001 ef05ffbc ef05ffa8
[ 132.740259] ffa0: c06543d8 c0654408 ef05e000 c0679fa8 ef05ffd4 ef05ffc0 c000ede4 c06543ac
[ 132.748461] ffc0: c090ddf8 00000001 ef05fff4 ef05ffd8 c06648ac c000ed94 2f05406a 00000015
[ 132.756663] ffe0: 10c03c7d c093b78c 00000000 ef05fff8 00664294 c06647b0 d6ed4e5f e9fdfdde
[ 132.764860] Backtrace:
[ 132.767327] [<c06543fc>] (platform_cpu_die+0x0/0x6c) from [<c06543d8>] (cpu_die+0x38/0x5c)
[ 132.775611] r5:00000001 r4:ef05e000
[ 132.779225] [<c06543a0>] (cpu_die+0x0/0x5c) from [<c000ede4>] (cpu_idle+0x5c/0xe8)
[ 132.786813] r5:c0679fa8 r4:ef05e000
[ 132.790429] [<c000ed88>] (cpu_idle+0x0/0xe8) from [<c06648ac>] (secondary_start_kernel+0x108/0x12c)
[ 132.799496] r5:00000001 r4:c090ddf8
[ 132.803109] [<c06647a4>] (secondary_start_kernel+0x0/0x12c) from [<00664294>] (0x664294)
[ 132.811218] r7:c093b78c r6:10c03c7d r5:00000015 r4:2f05406a
[ 132.816943] Code: 00000000 00000000 000041ed 00001000 (50ca74b5)
[ 132.823054] ---[ end trace 43c2f171cb9d5247 ]---

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: linux-image-3.5.0-1606-armadaxp 3.5.0-1606.8
ProcVersionSignature: Ubuntu 3.5.0-1606.8-armadaxp 3.5.7.1
Uname: Linux 3.5.0-1606-armadaxp armv7l
ApportVersion: 2.6.1-0ubuntu9
Architecture: armhf
Date: Fri Dec 14 18:01:38 2012
MarkForUpload: True
ProcEnviron:
 LANGUAGE=en_US:
 TERM=vt102
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-armadaxp
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :

Happened both on 1606 and on 1605.

Ike Panhc (ikepanhc)
Changed in linux-armadaxp (Ubuntu):
assignee: nobody → Ike Panhc (ikepanhc)
Ike Panhc (ikepanhc)
Changed in linux-armadaxp (Ubuntu):
status: New → Confirmed
Ike Panhc (ikepanhc)
Changed in linux-armadaxp (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Ike Panhc (ikepanhc) wrote :

After few times reboot with kernel armdaxp-1607.9, the reboot oops is random event. Examine the oops text and put some debug message into the kernel, but the test kernel have no reboot oops.

diff --git a/arch/arm/plat-armada/hotplug.c b/arch/arm/plat-armada/hotplug.c
index db72c99..2cc1c22 100644
--- a/arch/arm/plat-armada/hotplug.c
+++ b/arch/arm/plat-armada/hotplug.c
@@ -83,13 +83,17 @@ void __ref platform_cpu_die(unsigned int cpu)
         * we're ready for shutdown now, so do it
         */

+pr_info("ikepanhc: %s:%d\n", __FUNCTION__, __LINE__);
        flush_cache_all();
 #ifdef CONFIG_SHEEVA_DEEP_IDLE
+pr_info("ikepanhc: %s:%d\n", __FUNCTION__, __LINE__);
        armadaxp_fabric_prepare_hotplug();
 #endif
        /* none zero means deepIdle wasn't entered and regret event happened */

+pr_info("ikepanhc: %s:%d\n", __FUNCTION__, __LINE__);
        platform_do_lowpower(cpu);
+pr_info("ikepanhc: %s:%d\n", __FUNCTION__, __LINE__);

        /*
         * bring this CPU back into the world of cache

Revision history for this message
Ike Panhc (ikepanhc) wrote :

After more testing,

[ 80.055601] Unable to handle kernel paging request at virtual address 60000093
[ 80.056939] CPU2: shutdown
[ 80.056946] ikepanhc: platform_cpu_die:86
[ 80.057287] ikepanhc: platform_cpu_die:89
[ 80.057305] Unable to handle kernel paging request at virtual address 60000093
[ 80.057309] pgd = c0004000
[ 80.057322] Internal error: Oops: 15 [#1] SMP ARM
[ 80.057325] Modules linked in: dm_multipath dm_raid45 dm_mirror dm_region_hash dm_log
[ 80.057347] CPU: 2 Not tainted (3.5.0-1608-armadaxp #10~d20130204t132133)
[ 80.057357] PC is at uid_cache_init+0x3c/0xac
[ 80.057363] LR is at platform_cpu_die+0x44/0xb4

The problem is within armadaxp_fabric_prepare_hotplug()

Revision history for this message
Ike Panhc (ikepanhc) wrote :

After adding some printk information, the PC is changed

[ 67.748144] Unable to handle kernel paging request at virtual address 60000093
[ 67.749548] ikepanhc: platform_cpu_die:86
[ 67.749553] CPU2: shutdown
[ 67.749890] ikepanhc: platform_cpu_die:89
[ 67.749904] Unable to handle kernel NULL pointer dereference at virtual address 00000093
[ 67.749909] pgd = c0004000
[ 67.749921] Internal error: Oops: 15 [#1] SMP ARM
[ 67.749925] Modules linked in: dm_multipath dm_raid45 dm_mirror dm_region_hash dm_log
[ 67.749946] CPU: 2 Not tainted (3.5.0-1608-armadaxp #10~d20130204t154142)
[ 67.749959] PC is at armadaxp_fabric_prepare_hotplug+0x7c/0x10c
[ 67.749967] LR is at platform_cpu_die+0x27/0xb4
[ 67.749973] pc : [<c08b949c>] lr : [<c065bc6b>] psr: 20000093
[ 67.749973] sp : ef063f90 ip : ef063ed8 fp : ef063fa4
[ 67.749977] r10: 00000000 r9 : 562f5842 r8 : 0000001d
[ 67.749981] r7 : c094378c r6 : c07ea3e4 r5 : c0943118 r4 : 00000002
[ 67.749986] r3 : 60000093 r2 : 00000000 r1 : 00000093 r0 : 0000001d
[ 67.749991] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 67.749995] Control: 10c53c7d Table: 0000406a DAC: 00000015
[ 67.750000] Process swapper/2 (pid: 0, stack limit = 0xef0622f0)
[ 67.750004] Stack: (0xef063f90 to 0xef064000)
[ 67.750011] 3f80: ef062000 00000002 ef063fbc ef063fa8
[ 67.750020] 3fa0: c065bc20 c065bc50 ef062000 c0681fe8 ef063fd4 ef063fc0 c000ede4 c065bbf4
[ 67.750029] 3fc0: c0915df8 00000002 ef063ff4 ef063fd8 c066c14c c000ed94 c066bb1c 2f05406a
[ 67.750037] 3fe0: 00000015 10c03c7d 00000000 ef063ff8 0066bb34 c066c050 fdefffdf feff9afe
[ 67.750040] Backtrace:
[ 67.750052] [<c065bc44>] (platform_cpu_die+0x0/0xb4) from [<c065bc20>] (cpu_die+0x38/0x5c)
[ 67.750055] r5:00000002 r4:ef062000

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Reboot test on all SRU kernel, the problem is between quantal-1603.5 and 1604.6.

I've rebooted system with 1603.5 more then 50 times and each of the reboot are successful.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

New plan is to bisect from the delta.

Revision history for this message
Ike Panhc (ikepanhc) wrote :
Download full text (4.9 KiB)

The root cause is within this list

ccbce003cb6bf23af61a3974ea2a1bb1311dbd53 xfrm_user: don't copy esn replay window twice for new states
68b3ed3a9ef58b0c7afcc2d0d79fda45ab10c0e7 xfrm_user: fix info leak in copy_to_user_tmpl()
c636c5cfbb60c465c4c29034fcfbc837efa0e9ff xfrm_user: fix info leak in copy_to_user_policy()
7c4626d6d6d6de91c47b9e7dacc49648cd846086 xfrm_user: fix info leak in copy_to_user_state()
d789372fc6af54f8f5661cc4348534a73147c08d xfrm_user: fix info leak in copy_to_user_auth()
33958b034e874b84264e9063547d03da6a457b9d xfrm: fix a read lock imbalance in make_blackhole
8195bbaed1fc01ce00ce059be8ddb8ae208e6e5d xfrm_user: return error pointer instead of NULL #2
3703996bf6768800e7e1a48ad81594dd0db20381 xfrm_user: return error pointer instead of NULL
a889f40682a5861945a61e87e8b00482bcda4889 xfrm: Workaround incompatibility of ESN and async crypto
14ceea3bad8a8e66877d878c7e26fdbc238e0cbe tcp: fix regression in urgent data handling
5f9091e0e9917c4685780c15d4971f18ef397909 bnx2x: fix rx checksum validation for IPv6
9bfa25ed26cc3b811ca98b5ecfa4073dbc4bd6f3 localmodconfig: Fix localyesconfig to set to 'y' not 'm'
b630302e4d6e76d2d03270691c93662382011c64 jbd2: don't write superblock when if its empty
ca56abd495b5824dc8796e6fb503c868e65b4fa2 workqueue: add missing smp_wmb() in process_one_work()
1abfb59a8531f64a039ef578cb0603f43cb1bccf PM / Sleep: use resume event when call dpm_resume_early
30a8849e379ba2e9d8a1b7b2010fb4f4361e7f14 rapidio/rionet: fix multicast packet transmit logic
30beffb1402ef591617580c13648999e51aea248 ixgbe: fix PTP ethtool timestamping function
2dba2817215b7a59fd74538e8eae54628e9892d0 powerpc/eeh: Fix crash on converting OF node to edev
77dba6aea72c698957ada37a46d43cecef1a14ee lguest: fix occasional crash in example launcher.
9b89724f12a9a691849cebcd34deb86ae53282ed drivers/scsi/atp870u.c: fix bad use of udelay
86d3a9905f2a1861dcd22b4603e2a6d4e7fe9ebb kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()
4237d80449d342a2557a17ddbcb77fd72b5481fe lib/gcd.c: prevent possible div by 0
90c6acefe4366b41f9dd31265dd71e71e332fcea mfd: max8925: Move _IO resources out of ioport_ioresource
9bc3de8ef4691b80a3a67247f4f87868f17da793 PCI: acpiphp: check whether _ADR evaluation succeeded
6847295042af9d22c0dadda8121a09d3cd6ffa40 ACPI: run _OSC after ACPI_FULL_INITIALIZATION
b3b778f88bbf5c88456f4c6bb072303598a82ecf media: gspca_pac7302: add support for device 1ae7:2001 Speedlink Snappy Microphone SL-6825-SBK
4103b8cec5cc40545dd07fded026c3221f505a3e media: rc: ite-cir: Initialise ite_dev::rdev earlier
1a61e4daae2748335673d8f41e3202109dfedadd em28xx: Make all em28xx extensions to be initialized asynchronously
0b548af1fc2a88aaf85bfdfeac505693932cc4db ARM: 7548/1: include linux/sched.h in syscall.h
9d7bd9a3db73097582fa65a0537970873693791c intel-iommu: Default to non-coherent for domains unattached to iommus
df70bc2f8ac45b8b05ebe779ab8d8ba58d8443de slab: fix the DEADLOCK issue on l3 alien lock
271b70aa809cb09b630b8a927d8c4fc7d5142021 kbuild: Fix gcc -x syntax
3564f5ad89796c29c2cfaceb721fc31d0d7d4f59 kbuild: make: fix if_changed when command contains backslashes
bc167c183d5f39f016e39fe2586702c7769340fa mn10300: only add -mmem-funcs to K...

Read more...

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Looks like this is the root cause. I am testing to make sure.

$ git show 86d3a9905f2a1861dcd22b4603e2a6d4e7fe9ebb
commit 86d3a9905f2a1861dcd22b4603e2a6d4e7fe9ebb
Author: Shawn Guo <email address hidden>
Date: Thu Oct 4 17:12:23 2012 -0700

    kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()

    BugLink: http://bugs.launchpad.net/bugs/1066176

    commit f96972f2dc6365421cf2366ebd61ee4cf060c8d5 upstream.

    As kernel_power_off() calls disable_nonboot_cpus(), we may also want to
    have kernel_restart() call disable_nonboot_cpus(). Doing so can help
    machines that require boot cpu be the last alive cpu during reboot to
    survive with kernel restart.

    This fixes one reboot issue seen on imx6q (Cortex-A9 Quad). The machine
    requires that the restart routine be run on the primary cpu rather than
    secondary ones. Otherwise, the secondary core running the restart
    routine will fail to come to online after reboot.

    Signed-off-by: Shawn Guo <email address hidden>
    Signed-off-by: Andrew Morton <email address hidden>
    Signed-off-by: Linus Torvalds <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>
    Signed-off-by: Leann Ogasawara <email address hidden>

diff --git a/kernel/sys.c b/kernel/sys.c
index 2d39a84..0349bde 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -368,6 +368,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier);
 void kernel_restart(char *cmd)
 {
        kernel_restart_prepare(cmd);
+ disable_nonboot_cpus();
        if (!cmd)
                printk(KERN_EMERG "Restarting system.\n");
        else

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Looks like its the root cause, revert this patch and test reboot passed for 50 times. Need to review what's this patch for and see if revert is reasonable.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

The revert is in 3.5.0-1609.11

Changed in linux-armadaxp (Ubuntu):
status: In Progress → Fix Committed
Ike Panhc (ikepanhc)
Changed in linux-armadaxp (Ubuntu Quantal):
status: New → Fix Committed
Changed in linux-armadaxp (Ubuntu):
status: Fix Committed → Invalid
Changed in linux-armadaxp (Ubuntu Quantal):
assignee: nobody → Ike Panhc (ikepanhc)
Changed in linux-armadaxp (Ubuntu):
assignee: Ike Panhc (ikepanhc) → nobody
Revision history for this message
Ike Panhc (ikepanhc) wrote :

3.5.0-1609.11 passed 400 times reboot continuously.

tags: added: verification-done
Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (28.9 KiB)

This bug was fixed in the package linux-armadaxp - 3.5.0-1609.11

---------------
linux-armadaxp (3.5.0-1609.11) quantal-proposed; urgency=low

  [ Ike Panhc ]

  * Release Tracking Bug
    - LP: #1118281
  * [Config] Enable CONFIG_EXT{3,4}_FS_XATTR
    - LP: #1102970
  * Rebase onto Ubuntu-3.5.0-24.37

  [ Upstream Kernel Changes ]

  * Revert "kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()"
    - LP: #1090591

  [ Ubuntu: 3.5.0-24.37 ]

  * Release Tracking Bug
    - LP: #1117492
  * [Config] CONFIG_ALX=m for x86 only
    - LP: #927782

  [ Ubuntu: 3.5.0-24.36 ]

  * Release Tracking Bug
    - LP: #1116501
  * [Config] Enable RTSX_PCI modules
    - LP: #1057089
  * [Config] enable various HVC consoles
    - LP: #1102206
  * Revert "SAUCE: samsung-laptop: disable in UEFI mode"
    - LP: #1111689
  * [Config] updateconfigs for 3.5.7.3 stable update
  * d-i: Add mellanox ethernet drivers to nic-modules
    - LP: #1015339
  * SAUCE: alx driver import script
    - LP: #927782
  * SAUCE: alx: Update to heads/master
    - LP: #927782
  * SAUCE: samsung-laptop: Add quirk for broken acpi_video backlight on
    N250P
    - LP: #1086921
  * (config) Move 9p modules into generic package
    - LP: #1107658
  * [debian] Remove dangling symlink from headers package
    - LP: #1112442
  * [config] CONFIG_ALX=m
    - LP: #927782
  * [Config] Add alx to d-i nic-modules
    - LP: #927782
  * Revert "8139cp: revert "set ring address before enabling receiver""
    - LP: #1102417
  * Revert "ath9k_hw: Update AR9003 high_power tx gain table"
    - LP: #1102417
  * Revert "drm/i915: no lvds quirk for Zotac ZDBOX SD ID12/ID13"
    - LP: #1102417
  * Revert "ALSA: hda - Shut up pins at power-saving mode with Conexnat
    codecs"
    - LP: #1106966, #886975
  * be2net: don't call vid_config() when there's no vlan config
    - LP: #1083088
  * be2net: cleanup be_vid_config()
    - LP: #1083088
  * be2net: do not modify PCI MaxReadReq size
    - LP: #1083088
  * be2net: fix reporting number of actual rx queues
    - LP: #1083088
  * be2net: do not use SCRATCHPAD register
    - LP: #1083088
  * be2net: Fix driver load for VFs for Lancer
    - LP: #1083088
  * be2net: Explicitly clear the reserved field in the Tx Descriptor
    - LP: #1083088
  * be2net: Regression bug wherein VFs creation broken for multiple cards.
    - LP: #1083088
  * be2net: Fix to trim skb for padded vlan packets to workaround an ASIC
    Bug
    - LP: #1083088
  * be2net: Fix Endian
    - LP: #1083088
  * be2net: Fix error while toggling autoneg of pause parameters
    - LP: #1083088
  * be2net : Fix die temperature stat for Lancer
    - LP: #1083088
  * be2net: Fix initialization sequence for Lancer
    - LP: #1083088
  * be2net: Activate new FW after FW download for Lancer
    - LP: #1083088
  * be2net: Fix cleanup path when EQ creation fails
    - LP: #1083088
  * be2net: Enable RSS UDP hashing for Lancer and Skyhawk
    - LP: #1083088
  * be2net: dont pull too much data in skb linear part
    - LP: #1083088
  * be2net: Fix VF driver load for Lancer
    - LP: #1083088
  * be2net: Ignore physical link async event for Lancer
    - LP: #1083088
  * be2net: Fix to parse RSS hash from...

Changed in linux-armadaxp (Ubuntu Quantal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.