log.c Assert failed - err=>number == EIO

Bug #912558 reported by Dave Walker
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
upstart
Fix Released
High
James Hunt
upstart (Ubuntu)
Fix Released
High
James Hunt
Precise
Fix Released
High
James Hunt

Bug Description

Whilst booting, init has an issue and causes a core dump -> kernel panic.

Appending --no-log to the kernel cmd line allows a successful boot.

Related branches

Revision history for this message
Dave Walker (davewalker) wrote :

Attached screenshot.

description: updated
Revision history for this message
Dave Walker (davewalker) wrote :

Whilst looking for others that have encountered this, I came across an unreferenced pastebin http://pastebin.com/sPwXJS0x , seems to be the same issue.

...
init: log.c: 291: Assertion failed in log_io_error_handler: err-> number == EIO
init: Caught abort, core dumped
init: io.c: 1312: Unhandled error from nih_io_destroy: bad file descriptor
init: Caught abort, core dumped
kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Tainted: P C O 3.2.0-7-generic #13-Ubuntu
Call Trace:
[<ffffffff81639ce7>] panic+0x91/0x1a7
[<ffffffff81069205>] forget_original_parent+0x245/0x250
[<ffffffff81069227>] exit_notify+0x17/0x160
[<ffffffff81069b03>] do_exit+0x1f3/0x420
[<ffffffff81141713>] ? do_munmap+0x1f3/0x2f0
[<ffffffff81069ed4>] do_group_exit+0x44/0xa0
[<ffffffff81069f47>] sys_exit_group+0x17/0x20
[<ffffffff8165a5c2>] system_call_fastpath+0x16/0x1b
...

Changed in upstart (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Steve Langasek (vorlon) wrote :

The output here unfortunately doesn't give many clues as to what's really going on. James, can we turn up the verbosity for this error case?

Changed in upstart (Ubuntu):
assignee: nobody → James Hunt (jamesodhunt)
tags: added: rls-p-tracking
Revision history for this message
James Hunt (jamesodhunt) wrote :

It seems that a particular job or combo of jobs is causing this issue, but unclear why (since all jobs are treated equally by the logger).

I'm currently working with Dave to try and isolate the job(s) causing the failure. I'll also create a PPA with some extra debug to identify what err->number is set to. It's looking like EBADF from the stackstrace.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in upstart (Ubuntu):
status: New → Confirmed
Revision history for this message
VinDSL (perfect-pecker) wrote :

I'm having this issue as well, on my Ubu 12.04 alpha 1 install. It happened after the 5-Jan-2011 update(s). I'm performing forensics now.

The --no-log workaround (above) allowed me to boot into Ubu 12.04

I might also mention...

I have two kernels installed on this machine - Linux 3.1 & Linux 3.2 - and this init bug caused a kernel panic in both of them.

Revision history for this message
VinDSL (perfect-pecker) wrote :

I meant to type 5-Jan-2012 (not 2011) above... sigh!

Anyway, I thought I should also mention, immediately preceding the kernel panic/core dump, a little flashing white line appears in Plymouth... and the end of the animated dots. Never saw that before.

After I implemented the --no-log workaround, the (previously) flashing white line was replaced with a message about CUPS, after which it continues the normal boot sequence, rather than crashing. Never saw that before, either.

This odd behavior may, or may not, be indicative. Moreover, I don't have a printer installed on this machine (local or network). LoL!

Colin Watson (cjwatson)
Changed in upstart (Ubuntu Precise):
milestone: none → precise-alpha-2
Revision history for this message
James Hunt (jamesodhunt) wrote :

@VinDSL: Thanks for reporting. Please could you list the .conf files you have. In other words, what does the following return:

  ls /etc/init/

Revision history for this message
James Hunt (jamesodhunt) wrote :

To allow us more time to diagnose this issue, we've decided to push out an updated upstart package where logging has been disabled by default.

Please update your systems to install upstart version '1.4-0ubuntu2'.

Changed in upstart (Ubuntu Precise):
importance: Critical → High
Revision history for this message
James Hunt (jamesodhunt) wrote :

I think we now understand what is happening here. Essentially, the problem occurs when jobs refer to binaries that don't exist on the system. The common case of this scenario is if you've installed a package with a job file in /etc/init/, but then removed the package. The binaries have now been removed, but the configuration files remain (unless you 'purge' the package).

The specific scenario causing the failure is if Upstart attempts to run a single-line *non-existent* command with no shell meta-characters in it, the assertion fires.

Revision history for this message
James Hunt (jamesodhunt) wrote :

If you have this problem, it would be *very* helpful to use if you could run the attached script on your system.

Please report back if you got any output from the script. If the script produces no output, you should not have a problem.

Process:

1) download script to your system.
2) chmod 755 find_missing_commands.sh
3) ./find_missing_commands.sh
4) report back results on this bug.

Thanks very much!

Revision history for this message
VinDSL (perfect-pecker) wrote :

@James Hunt

vindsl@Zuul:~$ ls /etc/init/
acpid.conf networking.conf
alsa-restore.conf network-interface.conf
alsa-store.conf network-interface-security.conf
anacron.conf network-manager.conf
apport.conf network-manager.conf~
atd.conf plymouth.conf
avahi-daemon.conf plymouth-log.conf
binfmt-support.conf plymouth-splash.conf
console-setup.conf plymouth-stop.conf
control-alt-delete.conf plymouth-upstart-bridge.conf
cron.conf procps.conf
cups.conf rc.conf
dbus.conf rcS.conf
dmesg.conf rc-sysinit.conf
failsafe.conf rsyslog.conf
failsafe-x.conf setvtrgb.conf
friendly-recovery.conf tty1.conf
hostname.conf tty2.conf
hwclock.conf tty3.conf
hwclock-save.conf tty4.conf
irqbalance.conf tty5.conf
lightdm.conf tty6.conf
modemmanager.conf udev.conf
module-init-tools.conf udev-fallback-graphics.conf
mountall.conf udev-finish.conf
mountall-net.conf udevmonitor.conf
mountall-reboot.conf udevtrigger.conf
mountall-shell.conf ufw.conf
mounted-debugfs.conf upstart-socket-bridge.conf
mounted-dev.conf upstart-udev-bridge.conf
mounted-proc.conf ureadahead.conf
mounted-run.conf ureadahead-other.conf
mounted-tmp.conf wait-for-state.conf
mounted-var.conf
vindsl@Zuul:~$

Revision history for this message
VinDSL (perfect-pecker) wrote :

@James Hunt

vindsl@Zuul:~/Downloads$ sh find_missing_commands.sh
NetworkManager
vindsl@Zuul:~/Downloads

I'm running Wicd as my network manager.

network-manager refuses to work (on my PC) under Linux 3.2, e.g. no network connection(s), and was uninstalled, some time ago.

Revision history for this message
Kaulbach (mystic-scientist) wrote :

The pastebin from posting number 2 is mine, and yes the same problem. I manually wrote down from the kernel panic screen then transcribed it to a file while in recovery mode, used pastebinit to post.
Installing the upstart version mentioned earlier, it finally allowed me to boot normally.
Previously it would say it's doing a core dump but could never find one on the disk.
Also when booting to recovery mode, issuing: shutdown -h now would give the same kernel panic.
Posting number 10 at http://ubuntuforums.org/showthread.php?s=55d242cb643cf1b8293c27b55351e73e&t=1904842 seems to offer a possible solution with:
dpkg -l |awk ‘/^rc/ {print $2}’ |xargs sudo dpkg --purge

to clear out old conf files

Revision history for this message
James Hunt (jamesodhunt) wrote :

@VinDSL: Thanks very much! Your result proves the theory. Anyone else get any results from this script?

Revision history for this message
James Hunt (jamesodhunt) wrote :

@Kaulbach: Thank you for taking the time to transcribe the panic.

A full fix for this problem has already been written but requires new test cases to be written too (it also requires the full test suite to be run extensively).

Rest assured, we intend to put out an updated Upstart package as soon as possible with logging re-enabled.

Revision history for this message
Kaulbach (mystic-scientist) wrote :

mike@cornwall:~$ ./find_missing_commands.sh
nmbd
smbd
ubiquity
mike@cornwall:~$

Revision history for this message
Vinycius Maia (mrk3004-deactivatedaccount) wrote :

mrk3004@viny-server:~$ ./find_missing_commands.sh
mrk3004@viny-server:~$

Revision history for this message
VinDSL (perfect-pecker) wrote :

Just got home and updated the Upstart Package from 1.4-0ubuntu1 -> 1.4-0ubuntu2

No more kernel panic!!!

Thanks, for your quick attention to this dire situation, James Hunt!

Revision history for this message
James Hunt (jamesodhunt) wrote :

@VinDSL: no problem. Thank *you* for working with us to identify the problem.

Revision history for this message
James Hunt (jamesodhunt) wrote :

@Vinycius: Please could you run the attached "find_missing_commands.pl" Perl script. This is an improvement over the original shell script since it now checks for *all* shell meta characters rather than just the most common ones.

Process:

1) download find_missing_commands.pl script to your system.
2) chmod 755 find_missing_commands.pl
3) ./find_missing_commands.pl
4) report back results on this bug.

Thanks very much!

Revision history for this message
Vinycius Maia (mrk3004-deactivatedaccount) wrote :

I updated the Upstart package to 1.4-0ubuntu2

mrk3004@viny-server:~/Build$ ./find_missing_commands.pl
mrk3004@viny-server:~/Build$

James Hunt (jamesodhunt)
Changed in upstart:
importance: Undecided → High
status: New → Incomplete
status: Incomplete → Opinion
status: Opinion → Confirmed
assignee: nobody → James Hunt (jamesodhunt)
Revision history for this message
VinDSL (perfect-pecker) wrote :

@James Hunt: As an aside, I tested the more inclusive perl script (above). Ran fine -- same result(s)

vindsl@Zuul:~/Downloads$ ./find_missing_commands.pl
relative command NetworkManager not found
vindsl@Zuul:~/Downloads$

Thanks, again!

Revision history for this message
Kaulbach (mystic-scientist) wrote :

mike@cornwall:~$ ./find_missing_commands.pl
relative command nmbd not found
relative command smbd not found
relative command ubiquity not found
mike@cornwall:~$

James Hunt (jamesodhunt)
Changed in upstart:
status: Confirmed → Fix Committed
James Hunt (jamesodhunt)
Changed in upstart (Ubuntu Precise):
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 1.4-0ubuntu3

---------------
upstart (1.4-0ubuntu3) precise; urgency=low

  [ Stéphane Graber ]
  * Mark upstart Multi-Arch:foreign

  [ James Hunt ]
  * Merge of important logger fixes from upstream lp:upstart
    (LP: #912558).
  * Reverted temporary fix to disable job logging.
 -- James Hunt <email address hidden> Thu, 26 Jan 2012 15:13:25 +0000

Changed in upstart (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Vincent Fortier (th0ma7) wrote :

Perhaps this bug is still present? I'm having the exact same issue at boot-time....

root@th0ma7:/home/th0ma7# ./find_missing_commands.pl
relative command gdm-binary not found
relative command kdm not found

root@th0ma7:/home/th0ma7# dpkg -l | grep -i upstart
ii upstart 1.4-0ubuntu8 event-based init daemon

Revision history for this message
Vincent Fortier (th0ma7) wrote :
James Hunt (jamesodhunt)
Changed in upstart:
status: Fix Committed → Fix Released
Revision history for this message
Vincent Fortier (th0ma7) wrote :

System now using upstart 1.5 (as per latest updates...). Fix confirmed as functionnal. thnx!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.