Upstartification of /etc/init.d/networking has lost deconfiguring-networking event causing bad side-effects

Bug #1061639 reported by James Hunt
226
This bug affects 52 people
Affects Status Importance Assigned to Milestone
ifupdown (Ubuntu)
Fix Released
High
Stéphane Graber
Quantal
Fix Released
High
Stéphane Graber

Bug Description

In precise, /etc/init.d/networking was a true SysV service script that did the following:

initctl emit deconfiguring-networking

In quantal, /etc/init.d/networking is now using the upstart-job symlink back to the upstart job /etc/init/networking.conf.

The problem is that the 'deconfiguring-networking' is no longer being emitted. This causes dbus to fail to stop on system shutdown which causes a cascading effect whereby other Upstart jobs are also not shut down. Eventually, the system halt with the message:

mount: / is busy

This results in a unclean shutdown which can result in FSCK being run / slow / bad user boot experience.

The two main options here are:

1) Re-instate the 'deconfiguring-networking' event.
2) Change the dbus 'stop on' condition and update upstart-events.7 to remove 'deconfiguring-networking'.

Related branches

James Hunt (jamesodhunt)
Changed in ifupdown (Ubuntu):
importance: Undecided → High
Revision history for this message
Steve Langasek (vorlon) wrote :

Stéphane, this is a regression in the latest upload that needs to be fixed for release. Thanks!

Changed in ifupdown (Ubuntu Quantal):
assignee: nobody → Stéphane Graber (stgraber)
status: New → Triaged
Revision history for this message
dino99 (9d9) wrote :

On last shutdown in verbose mode, i've seen one more issue:

During the last part of the shutdown process, a comment is written and says:
- trying to retrieve the remaining processes
- then, strangely, modemmanager is restarted and load its dozen drivers (got a modemmanager daily update "0.6.0.0.really-0ubuntu1). I've already seen randomly that drivers reload on shutdown in the past.
- and get : mount / is busy

With the next reboot, the booted partition has broken inodes:

oem@dub:~$ dmesg | grep orphan
[ 8.812542] EXT4-fs (sdb5): orphan cleanup on readonly fs
[ 8.819511] EXT4-fs (sdb5): ext4_orphan_cleanup: deleting unreferenced inode 400809
[ 8.819581] EXT4-fs (sdb5): 1 orphan inode deleted

Changed in ifupdown (Ubuntu Quantal):
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.7.2ubuntu2

---------------
ifupdown (0.7.2ubuntu2) quantal; urgency=low

  * Fix regression with the removal of the sysvinit networking job:
    - Don't bring 'lo' down (add it to --exclude)
    - Emit deconfiguring-networking (LP: #1061639)
 -- Stephane Graber <email address hidden> Tue, 09 Oct 2012 11:13:27 -0400

Changed in ifupdown (Ubuntu Quantal):
status: In Progress → Fix Released
Revision history for this message
the-unconventional (the-unconventional-deactivatedaccount-deactivatedaccount) wrote :

I can confirm that the unmounting issues are indeed fixed, and no more fsck warnings are shown in dmesg, but shutting down Quantal still takes conciderably longer than shutting down Precise.

Before this update, I could run 'sudo service networking stop', which would speed up the shutdown process a lot (which caused me to believe it was somehow related to network-manager), but since this update, stopping the networking service also seems to disable ConsoleKit, which will require me to enter my password in order to shut down; and the speed increase also seems to be gone.

Anyway, there is still something out there that causes Quantal to hang for a few seconds during shutdown, that might be somehow related to this.

Revision history for this message
Stéphane Graber (stgraber) wrote :

ConsoleKit is a dbus service and the fix for this bug was to emit the event that'd make dbus go away, so nothing unexpected here.

I'd expect any remaining slowness to be related to another upstart or sysvinit job taking longer than it used to on precise.

Revision history for this message
Jan Rathmann (kaiserclaudius) wrote :

I still get a message "mount: / is busy" at every shutdown in Quantal, I'm not sure whether this means this bug is still not fixed on some systems or if this is a different bug that I should report separately.

Kind regards,
Jan

Revision history for this message
Christian Niemeyer (christian-niemeyer) wrote :

For me this is NOT FIXED. I did a clean install twice yesterday (October 16th) with the daily images of quantal (amd64).

1.) Clean install onto a flash drive. 2.) Clean install onto SATA HDD. Both with the ext4 filesystem.

Both *never* shutdown cleanly. This really is problematic and makes the filesystem inconsistent over time. At least there is a very high risk. After a few re-boots I set tune2fs -c 1 /dev/sdX, to make sure that fsck is forced to check every time, to avoid inconsistencies. And every reboot or shutdown and boot fsck founds errors. The error messages "/ is busy" are described here: https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1058987

I added some sync; sync; sync; and sleep 3 to /etc/init.d/unmountfs, and some killall dnsmasq, etc. – it makes no differences. It hangs in the end again on "/ device is busy", powers off, ==> unclean filesystem at boot.

So far I lost one (testing) filesystem completely, which has become totally inconsistent (on a new 400GB SATA HDD). My parallel installations of 10.04 and 11.04 never had this behaviour.

If I unmount and boot the Live-CD/USB with these installer images and doing an fsck on my fresh install filesystem on the hard drive, then fsck also finds errors. Mostly something like "delated inodes has zero dtime. FIXED."

PS: Also I tried to let the installer format my filesystem, and doing the creating manually before the installer and just selecting it for "/", not formatting. Makes no difference.

Hm, this really drives me crazy.

PS: I dropped precise because of this issue. Still remaining in quantal. Good luck with fixing!

Revision history for this message
Christian Niemeyer (christian-niemeyer) wrote :
Download full text (18.0 KiB)

FINALLY worked! :) (for the first time ever with quantal!)

I tried about 2 hours shutting down manually. Stopping, starting init scripts, networking etc. I found this only way, and it was to be done exactly in this order. Contra: I have not yet figured the problem out exactly. I think many commands are not mandatory. Now it would be the time to sort them out.

I wrote a pseudo log file. Out of my bash history. I'm not an expert, but I really hope you get the idea:

>>"(NOTE: DONE BEFORE: sudo apt-get remove --purge modemmanager
COMMENT: "TESTED IT WITH REBOOT, DIDN'T DO ANYTHING GOOD OR BAD ITSELF")

BOOT, FSCK ERROR MESSAGE (EXIT 1, ERRORS RESOLVED), BOOT CONTINUING...

LIGTHDM

CTRL+ALT+F2

LOGIN TERMINAL

sudo service lightdm stop

sudo service dbus stop
COMMENT: "THAT DID HANG FOR A WHILE, APPROX 5 SECONDS / THE SAME IF YOU STOP NETWORKING BEFORE DBUS, IT HANGS, KILLS DBUS, -- THIS ORDER IS BETTER"

sudo service dbus start

sudo service dbus stop
COMMENT: "NOW BOTH STARTS AND STOPS VERY FAST AND CLEAN"

sudo service networking stop
COMMENT: "NETWORKING STOPPED FAST AND CLEANLY"

sudo service networking start
COMMENT: "THEORY: FOR CLEAN SHUTDOWN WE HAVE START/STOP YET AGAIN"

sudo service networking stop

sudo service rsyslog stop

sudo modprobe -r forcedeth bnep rfcomm bluetooth
COMMENT: "SOMETIMES FORCEDETH MODULES BRINGS EVERYONE IN TROUBLE. WHAT ABOUT FORCEDETH? AND I GOT SOME MESSAGES WITH BLUETOOTH MODULE, – WHICH I DON'T HAVE BTW –, SO UNLOADING IT FOR A CLEAN STATE, –– BUT DIDN'T HELPED WHEN ONLY DOING THIS"

sync

sudo init 1
COMMENT: "TAKES A WHILE, AND KILLING ALL REMAINING PROCESSES SAYS IT FAILED"

COMMENT: "NOW WE'RE ROOT, SO NO SUDO. BUT I ADD IT FOR NO CONFUSION"

sudo service udev stop
COMMENT: "STILL THERE STILL GOING (I THINK THAT'S CORRECT). AND STOPS FAST AND CLEAN."

sudo /etc/init.d/networking start
COMMENT: "AGAIN STARTING, BUT ONLY INIT.D, NOT USING "SERVICE". IT SEEMS THAT STARTING/STOPPING THOSE SCRIPTS SOLVES OUR PROBLEM"

COMMENT: "...AND NOW STOPPING POSSIBLY ALL OF THEM"

sudo /etc/init.d/networking stop

sudo /etc/init.d/network-manager stop

sudo /etc/init.d/network-interface-security stop

sudo /etc/init.d/network-interface-container stop

sudo /etc/init.d/network-interface stop

COMMENT: "ALL STOPS FAST AND CLEAN, PROMPTS"

COMMENT: "NEARLY DONE! NOW WE MAKE SURE, THAT FSCK IS FORCED TO RUN. IF ERRORS IT WILL SAY "WAS NOT CLEANY UMOUNTED. RETURN/EXIT VALUE: [1], PRINTED ON SCREEN". IF NO ERROR IT WILL SAY "WAS MOUNTED X TIMES, CHECK FORCED. RETURN VALUE: 0. NO EXIT/ERROR MESSAGE PRINTED, CLEAN; CONTINUING BOOT"

sudo tune2fs -c 1 /dev/sda1 #max mount counts to 1

sudo tune2fs -C 100 /dev/sda1 #make believe, it was mounted 100 times already, will trigger fsck on reboot

COMMENT: "DOING TESTING"

sudo lsof /

sudo fuser /

COMMENT: "AHA! LSOF STILL SHOW 2 (TWO!) PROCESSES NTPD RUNNING. BETTER KILL THEM. HOWEVER, I DON'T THINK THEY ARE THE PROBLEM. BECAUSE IF I ONLY KILL THEM, WITHOUT ALL THE PROCEDURE EXACTLY(!) IN ORDER ABOVE, THEN IT DOESN'T HELP"

sudo killall -15 ntpd

COMMENT: "CHECKING"

sudo ps -e

sudo lsmod

COMMENT: "LOOKS GOOD!"

sudo sync; sync; sync; sudo init 6

***REBOOOT***

....AND...

...

Revision history for this message
Christian Niemeyer (christian-niemeyer) wrote :
Revision history for this message
Christian Niemeyer (christian-niemeyer) wrote :

Final release images tested. Clean standard install, default options on 400GB HDD. Still "/ device is busy".

Shutdown standard. Log out. And in lightdm > Reboot. Problem remains! With the Live DVD testing the filesystem after shutdown and booting the DVD instead, it says:

"sudo fsck.ext4 -Ffy /dev/sdb1
e2fsck 1.42.5 (29-Jul-2012)
quantalx64: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (4431956, counted=4432263).
Fix? yes

Free inodes count wrong (513823, counted=513874).
Fix? yes

quantalx64: ***** FILE SYSTEM WAS MODIFIED *****"

**PS**: Still remains that "sudo service dbus stop" and/or "sudo service networking stop" hangs for about 10 seconds. I think the problem remains, that dbus is not shut down cleanly.

Revision history for this message
Christian Niemeyer (christian-niemeyer) wrote :

Here the same behaviour is described:
"Failure to umount root file system at shutdown"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1044640

Revision history for this message
Christian Niemeyer (christian-niemeyer) wrote :

Err, hello? This is not fixed. I did an installation twice after relase day. I have no problem with it, I did "sudo apt-get remove --purge dnsmasq-base resolvconf wpasupplicant isc-dhcp-client isc-dhcp-common libnm-glib-vpn1 libnm-glib4 libnm-gtk-common libnm-gtk0 libnm-util2 network-manager network-manager-gnome ubuntu-minimal ntp plymouth-label plymouth-theme-lubuntu-logo plymouth-theme-lubuntu-text plymouth-theme-ubuntu-text mobile-broadband-provider-info blueman bluez lubuntu-core lubuntu-desktop modemmanager obex-data-server ppp pppconfig pppoeconf rfkill wvdial mlocate"

After that, it worked.

Dear folks, this is a MAJOR bug. I think it is the dbus-script. And I guess it only happens with PCs/Installations with *wired* connections. Can this be possible? I have can reproduce this error 100/100.

Because data loss is in the possible range for normal user, let me put it that way, that is why I'm quite "emotional" about this bug. Not because I would be angry (it works for me), but because it is very very bad for Ubuntu's reputation.

I use Ubuntu since 2005. I'm (quite) an advanced user, no professional though. BUT I never had an issue like this. Never. It is for reproductable with different machines. Unclean shutdowns *always*. And still going. No SRU so far. Sorry, I can't understand this.

PS: Hint #1: Dbus. Hint #2: Mabye this only happens on machines with wired connections? (I have forcedeth, working flawlessly though.)

Revision history for this message
Marius B. Kotsbak (mariusko) wrote :

Christian, this bug is closed as fixed, so it should not be opened again, so could you please run "ubuntu-bug ifupdown" in a terminal. Also state there if this happens on new installations or only upgraded installations. Please add a reference here to the new bug report number.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Please don't file it against ifupdown as it's not the problem anymore, if you're still seeing this bug, it's very likely to be something completely unrelated to ifupdown.

Revision history for this message
Marius B. Kotsbak (mariusko) wrote :

Then I would suggest "upstart", "dbus" or "network-manager".

Revision history for this message
yota (yota-opensystems) wrote :

To detect what is keeping / filesystem busy I put a:

lsof / > /openfiles.txt
sync

on top of /etc/init.d/umountroot

In my case it results that dhclient and dnsmasq are most likely to be the root cause; who is supposed to bring them down?
I disabled dnsmasq commenting dns=dnsmasq in /etc/NetworkManager/NetworkManager.conf and add a "killall dhclient" in if-down.d and now shutdowns are clean on my 3 machines.

On the other hand "service networking stop" still takes ages to complete (but only the first time you do it) which leads to think that there is also something else which is not working properly with dbus.

This issue seems really critical to me as well: shutdown time is worsened (at least 3x in my case) while boot time suffers from time needed for fsck (which is triggered every time since the fs is dirty).
So both user experience is impaired and data integrity is at risk, but is not so clear against wich package to file a bug.
Expecially since it can be some weird interaction between them and I would not take for granted that ifupdown is not relevant anymore.

Revision history for this message
Marius B. Kotsbak (mariusko) wrote :

Could it be bug #1070647 that you are seeing?

Revision history for this message
yota (yota-opensystems) wrote :

Thank you Marius!

Yes, modem-manager was causing a significant part (but not all) of the shutdown delay, I'll post my findings about it in the relative bug report.

Still I believe that ifupdown could handle dhclient and/or dnsmasq that were keeping / busy.

Revision history for this message
Ernie 07 (ernestboyd) wrote :

Yes, a fix was released.
NO, the problem has NOT been fixed.
When testing with a fresh 64-bit install via the 12.10 Released LiveDVD (from iso), this problem is 100% reproducible and occurs 100% of the time. Please see launchpad bug 1073433.

Also I have attached fsck output.

Revision history for this message
Ernie 07 (ernestboyd) wrote :

For my environment (Ethernet DSL to Internet) a temporary workaround follows:

1. UNCHECK Enable Networking
2. Wait until after the disconnected message goes away
3. Restart and Shutdown
4. Fsck from an alternate installation will NOT throw any errors.

Apparently Unchecking Enable Networking does something that a simple restart or shutdown does NOT in terms of preparing for a SAFE shutdown.

Revision history for this message
yota (yota-opensystems) wrote :

Ok, I've experienced again the problem on a new machine on which quantal was installed from scratch.
This is everything I've done to solve all issues and get back to a fast 5 seconds shutdown with no filesystem corruption:

- rm /etc/init/modem-manager.conf (it's started by network manager via dbus, we don't need a separate conf...)
- edit /etc/init/modem-manager.conf and /etc/init/dbus.conf to stop also on "runlevel [06]" (i.e. add "or runlevel [06]" at the end of "stop on" line
- add a "cleanup" script (chmod 755) in /etc/network/if-down.d/ which does "killall dhclient" and "killall dnsmasq"

If the machine is upgraded from older ubuntu versions check that there are no more references to /etc/init.d/networking from /etc/rc*.d directories.

Probably at this point is not ifupdown that should be patched anymore, but I decided to post here since this is one of the bug that most users should land searching for the problem.
If the above changes work for other users then separate bugs against each involved package should be filed. I hope someone else can eventually take care of it.

Revision history for this message
Ernie 07 (ernestboyd) wrote :

In my daily work archives are created to preserve snapshots of work in progress. In addition, they are extracted and the before/after folder structures are compared via Beyond Compare. IF Nvidia drivers would stop randomly hanging and thus forcing power resets, I might be able to reduce archiving activities significantly.

1. Should I advise my less-technical associates to SHUN Nvidia hardware because Nvidia continues to demonstrate that driver quality is not important?

2. Should I advise my less-technical associates to SHUN Ubuntu One and ALL versions of Ubuntu after 12.04 including any MOBILE offerings because bugs like this one do not get fixed?

Revision history for this message
watgrad (watgrad) wrote :

I also am concerned about this hidden bug - I only discovered there was an issue when I noticed that boot times were longer than normal. I have converted all my main computers to 12.10 64 bit...
They all show this behaviour with messaged like:
[ 1.111309] EXT4-fs (sdb1): INFO: recovery required on readonly filesystem
[ 1.111314] EXT4-fs (sdb1): write access will be enabled during recovery
...
[ 2.127937] EXT4-fs (sdb1): orphan cleanup on readonly fs
[ 2.127946] EXT4-fs (sdb1): ext4_orphan_cleanup: deleting unreferenced inode 30148324
[ 2.128033] EXT4-fs (sdb1): ext4_orphan_cleanup: deleting unreferenced inode 30148284
[ 2.128042] EXT4-fs (sdb1): ext4_orphan_cleanup: deleting unreferenced inode 30148298
[ 2.128057] EXT4-fs (sdb1): ext4_orphan_cleanup: deleting unreferenced inode 44828301
[ 2.128091] EXT4-fs (sdb1): 4 orphan inodes deleted
[ 2.128093] EXT4-fs (sdb1): recovery complete

So I understand that this could lead to data loss? ...corrupt file system? I have tried to follow yota's suggested fix, but could not get that to work. On one laptop (macbookpro7,1 dual boot) shutting down networking before shutting down the computer stops the behaviour - but this is not true for the desktop systems (win7 dual boot).

The bug message says 'fix released' - how to I get and apply this fix? (I already have the rec version of ifupdown...)

Revision history for this message
Ubfan (ubfan1) wrote :

See bug 879120 for a case in which dbus pauses 60 seconds in shutting down the network. Jan 11, 2013 I am still seeing "/ busy" messages at shutdown on a USB ext2 file system (no errors found from external check).

Revision history for this message
Ernie 07 (ernestboyd) wrote :

This bug STILL occurs in 64 bit 1304 (Raring) with kernel 3.8.0-0-generic.
Downloaded and tested today 2013-01_14

Revision history for this message
JamesTPG (paxos) wrote :

Bug still occurs
Asus U36JC, 128GB SSD instead of normal 500GB HDD
Xubuntu 12.10 (64bit)

mount: / is busy
# Will now halt
[xxx] Power down.
--Freeze--

Revision history for this message
tekkenlord (linuxfever) wrote :

Hello all,

I would just like to add that this issue also affects my desktop and laptop, the former with a wired connection and the latter with a wireless, both running quantal 64-bit.

I just tried the workaround in post #20 for my desktop and it seems to be working, i.e clean reboot/shutdown. This is of course far from ideal since at every restart I have to set up my wired connection from scratch (using KDE 4.9.5).

Thanks.

Revision history for this message
Ernie 07 (ernestboyd) wrote :

This bug STILL occurs in 64 bit 1304 (Raring) with kernel 3.8.0-2-generic.
Downloaded and tested today 2013-01_27

Revision history for this message
René Sitt (crucifier) wrote :

Hello all,

I experienced this bug now for quite a while, since upgrading to Quantal (64bit) to be exact, and while trying some of the above suggestions, I observed the following:

Until shortly, I got it fixed by following yota's advice (#21).
Then, after a kernel update to 3.5.0-22 (from 3.5.0-21, I think), the bug began reappearing, despite the workarounds.
Now yesterday, I updated to kernel 3.5.0-23 (from Quantal-Proposed), and it's working again...

From looking at the changesets, I really don't have a clue where the reappearing/disappearing of this bug comes from, yet I hope someone more adept may be able to work it out.

Revision history for this message
tekkenlord (linuxfever) wrote :

Hello all,

I can also confirm that the kernel update (3.5.0-23) seems to resolve the issue. Two restarts so far, both clean!
Thanks for the fix, hope we will not be seeing this bug in the future :)

Revision history for this message
Gard Spreemann (gspreemann) wrote :

I'm still seeing "mount: / is busy" and unclean shutdowns with 3.5.0-23.

Revision history for this message
tekkenlord (linuxfever) wrote :

@Gard Speermann

You may have tried this but...

The very first shutdown after the kernel update will not be clean as the kernel has not been loaded yet. So, first boot in the updated kernel and then try to restart...good luck!

Revision history for this message
Gard Spreemann (gspreemann) wrote :

I've tried several reboots. The problem persists. It could be that I'm experiencing a different bug, though. What I see is, however, consistent with what's described here (except for the update fixing the problem).

Revision history for this message
Tue Bækgaard Holm (tueholm) wrote :

Finally found a workaround, Yotas fix (#21) seemed to work fine for the time consumed shutting the system down, but i still got the bug. Eventually I found out that it only happened when I was connected to a network, so I made a small script which shuts down my network interfaces when my computer shuts down, and this did the trick.

Simply put I created: "/etc/init.d/nw.sh"

Containing: "ifconfig <interface> down" for all my physical network interfaces.

Then as root:

ln /etc/init.d/nw.sh /etc/rc0.d/K10nw.sh
ln /etc/init.d/nw.sh /etc/rc6.d/K10nw.sh

And bam! My computer was suddenly able to shut down. Hopes this helps.

Revision history for this message
Juan Simón (simonbcn) wrote :

If I execute "sudo service networking restart" or "sudo service networking stop" all ttyX are inaccessible and it closes all graphical sessions. Only it shows a black screen with a cross (the mouse pointer) on it.

Revision history for this message
Juan Simón (simonbcn) wrote :

The workaround in comment #35 doesn't work to me.
My system takes ~40 seconds in reboot. I don't know that the system does in that 40 seconds but it shows the messages in attached screenshot.

Revision history for this message
Øyvind Stegard (oyvindstegard) wrote :

Still have what looks like exactly this issue on a clean 13.10 installation. Looks like it goes away if I disconnect from wired network at login screen before rebooting or shutting down. Is there a new bug filed for 13.10 somewhere ?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.