ifupdown initialization problems caused by race condition

Bug #1337873 reported by Rafael David Tinoco
86
This bug affects 14 people
Affects Status Importance Assigned to Milestone
ifenslave (Ubuntu)
Fix Released
Medium
Unassigned
Precise
Won't Fix
Medium
Unassigned
Trusty
Fix Released
Medium
Unassigned
Vivid
Fix Released
Medium
Unassigned
Wily
Fix Released
Medium
Unassigned
ifupdown (Debian)
Fix Released
Unknown
ifupdown (Ubuntu)
Fix Released
Medium
Dariusz Gadomski
Precise
Won't Fix
Medium
Unassigned
Trusty
Fix Released
Medium
Unassigned
Vivid
Won't Fix
Medium
Unassigned
Wily
Fix Released
Medium
Unassigned

Bug Description

[Impact]

 * Lack of proper synchronization in ifupdown causes a race condition resulting in occasional incorrect network interface initialization (e.g. in bonding case - wrong bonding settings, network unavailable because slave<->master interfaces initialization order was wrong

 * This is very annoying in case of large deployments (e.g. when bringing up 1000 machines it is almost guaranteed that at least a few of them will end up with network down).

 * It has been fixed by introducing hierarchical and per-interface locking mechanism ensuring the right order (along with the correct order in the /e/n/interfaces file) of initialization

[Test Case]

 1. Create a VM with bonding configured with at least 2 slave interfaces.
 2. Reboot.
 3. If all interfaces are up - go to 2.

[Regression Potential]

 * This change has been introduced upstream in Debian.
 * It does not require any config changes to existing installations.

[Other Info]

Original bug description:

* please consider my bonding examples are using eth1 and eth2 as slave
 interfaces.

ifupdown some race conditions explained bellow. ifenslave does not
behave well with sysv networking and upstart network-interface scripts
running together.

!!!!
case 1)
(a) ifup eth0 (b) ifup -a for eth0
-----------------------------------------------------------------
1-1. Lock ifstate.lock file.
                                  1-1. Wait for locking ifstate.lock
                                      file.
1-2. Read ifstate file to check
     the target NIC.
1-3. close(=release) ifstate.lock
     file.
1-4. Judge that the target NIC
     isn't processed.
                                  1-2. Read ifstate file to check
                                       the target NIC.
                                  1-3. close(=release) ifstate.lock
                                       file.
                                  1-4. Judge that the target NIC
                                       isn't processed.
2. Lock and update ifstate file.
   Release the lock.
                                  2. Lock and update ifstate file.
                                     Release the lock.
!!!

to be explained

!!!
case 2)
(a) ifenslave of eth0 (b) ifenslave of eth0
------------------------------------------------------------------
3. Execute ifenslave of eth0. 3. Execute ifenslave of eth0.
4. Link down the target NIC.
5. Write NIC id to
   /sys/class/net/bond0/bonding
   /slaves then NIC gets up
                                  4. Link down the target NIC.
                                  5. Fails to write NIC id to
                                     /sys/class/net/bond0/bonding/
                                     slaves it is already written.
!!!

#####################################################################

#### My setup:

root@provisioned:~# cat /etc/modprobe.d/bonding.conf
alias bond0 bonding options bonding mode=1 arp_interval=2000

Both, /etc/init.d/networking and upstart network-interface begin
enabled.

#### Beginning:

root@provisioned:~# cat /etc/network/interfaces
# /etc/network/interfaces

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

I'm able to boot with both scripts (networking and network-interface
enabled) with no problem. I can also boot with only "networking"
script enabled:

---
root@provisioned:~# initctl list | grep network
network-interface stop/waiting
networking start/running
---

OR only the script "network-interface" enabled:

---
root@provisioned:~# initctl list | grep network
network-interface (eth2) start/running
network-interface (lo) start/running
network-interface (eth0) start/running
network-interface (eth1) start/running
---

#### Enabling bonding:

Following ifenslave configuration example (/usr/share/doc/ifenslave/
examples/two_hotplug_ethernet), my /etc/network/interfaces has to
look like this:

---
auto eth1
iface eth1 inet manual
    bond-master bond0

auto eth2
iface eth2 inet manual
    bond-master bond0

auto bond0
iface bond0 inet static
    bond-mode 1
    bond-miimon 100
    bond-primary eth1 eth2
 address 192.168.169.1
 netmask 255.255.255.0
 broadcast 192.168.169.255
---

Having both scripts running does not make any difference since we
are missing "bond-slaves" keyword on slave interfaces, for ifenslave
to work, and they are set to "manual".

Ifenslave code:

"""
for slave in $BOND_SLAVES ; do
...
# Ensure $slave is down.
ip link set "$slave" down 2>/dev/null
if ! sysfs_add slaves "$slave" 2>/dev/null ; then
 echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER
   ready and a bonding interface ?" >&2
else
 # Bring up slave if it is the target of an allow-bondX stanza.
 # This is usefull to bring up slaves that need extra setup.
 if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\"
  --list | grep -q $slave; then
  ifup $v --allow "$BOND_MASTER" "$slave"
 fi
"""

Without the keyword "bond-slaves" on the master interface declaration,
ifenslave will NOT bring any slave interface up on the "master"
interface ifup invocation.

*********** Part 1

So, having networking sysv init script AND upstart network-interface
script running together... the following example works:

---
root@provisioned:~# cat /etc/network/interfaces
# /etc/network/interfaces

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet manual
    bond-master bond0

auto eth2
iface eth2 inet manual
    bond-master bond0

auto bond0
iface bond0 inet static
    bond-mode 1
    bond-miimon 100
    bond-primary eth1
    bond-slaves eth1 eth2
    address 192.168.169.1
    netmask 255.255.255.0
    broadcast 192.168.169.255
---

Ifenslave script sets link down to all slave interfaces, declared by
"bond-slaves" keyword, and assigns them to correct bonding. Ifenslave
script ONLY tries to make a reentrant call to ifupdown if the slave
interfaces have "allow-bondX" stanza (not our case).

So this should not work, since when the master bonding interface
(bond0) is called, ifenslave does not configure slaves without
"allow-bondX" stanza. What is happening, why is it working ?

If we disable upstart "network-interface" script.. our bonding stops
to work on the boot. This is because upstart was the one setting
the slave interfaces up (with the configuration above) and not
sysv networking scripts.

It is clear that ifenslave from sysv script invocation can set the
slave interface down anytime (even during upstart script execution)
so it might work and might not:

"""
ip link set "$slave" down 2>/dev/null
"""

root@provisioned:~# initctl list | grep network-interface
network-interface (eth2) start/running
network-interface (lo) start/running
network-interface (bond0) start/running
network-interface (eth0) start/running
network-interface (eth1) start/running

Since having the interface down is a requirement to slave it,
running both scripts together (upstart and sysv) could create a
situation where upstart puts slave interface online but ifenslave
from sysv script puts it down and never bring it up again (because
it does not have "allow-bondX" stanza).

*********** Part 2

What if I disable upstart "network-interface", stay only with the sysv
script but introduce the "allow-bondX" stanza to slave interfaces ?

The funny part begins... without upstart, the ifupdown tool calls
ifenslave, for bond0 interface, and ifenslave calls this line:

"""
for slave in $BOND_SLAVES ; do
...
 if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\"
  --list | grep -q $slave; then
  ifup $v --allow "$BOND_MASTER" "$slave"
 fi
"""

But ifenslave stays waiting for the bond0 interface to be online
forever. We do have a chicken egg situation now:

* ifupdown trys to put bond0 interface online.
* we are not running upstart network-interface script.
* ifupdown for bond0 calls ifenslave.
* ifenslave tries to find interfaces with "allow-bondX" stanza
* ifenslave tries to ifup slave interfaces with that stanza
* slave interfaces keep forever waiting for the master
* master is waiting for the slave interface
* slave interface is waiting for the master interface
... :D

And we have an infinite loop for ifenslave:

"""
# Wait for the master to be ready
[ ! -f /run/network/ifenslave.$BOND_MASTER ] &&
 echo "Waiting for bond master $BOND_MASTER to be ready"
while :; do
    if [ -f /run/network/ifenslave.$BOND_MASTER ]; then
        break
    fi
    sleep 0.1
done
"""

*********** Conclusion

That can be achieved if correct triggers are set (like the ones I just
showed). Not having ifupdown parallel executions (sysv and upstart,
for example) can make an infinite loop to happen during the boot.

Having parallel ifupdown executions can trigger race conditions
between:

1) ifupdown itself (case a on the bug description).
2) ifupdown and ifenslave script (case b on the bug description).

Changed in ifupdown (Ubuntu):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (inaddy)
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Attaching script to reproduce described problem.

summary: - bonding initialization problems caused by race condition
+ Precise, Trusty, Utopic - bonding initialization problems caused by race
+ condition
summary: - Precise, Trusty, Utopic - bonding initialization problems caused by race
- condition
+ Precise, Trusty, Utopic - ifupdown initialization problems caused by
+ race condition
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "ifupdown_0.7.47.2ubuntu4.2~lp1337873.diff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Changed in ifupdown (Debian):
status: Unknown → New
description: updated
description: updated
description: updated
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

CORRECT WAY OF SETTING INTERFACES FILE FOR BONDING:

1) This model has race conditions.
2) YOU HAVE to have both scripts running (networking and network-interfaces)

# /etc/network/interfaces

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet manual
    bond-master bond0

auto eth2
iface eth2 inet manual
    bond-master bond0

auto bond0
iface bond0 inet static
    bond-mode 1
    bond-miimon 100
    bond-primary eth1
    bond-slaves eth1 eth2
    address 192.168.169.1
    netmask 255.255.255.0
    broadcast 192.168.169.255

You can expect this to fail from time to time but it works... working on a fix for this.

description: updated
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I have introduced one big lock for ifupdown. The ifup, ifdown or ifquery commands cannot be run simultaneously.

Since SEVERAL ifupdown pre/post scripts do need to make reentrant calls do these commands I created on environment variable that disabled the locking when reentrant calls are made to these scripts. This way sysv and upstart networking scripts will never step into other's feet.

Attaching fix for ifupdown.

PS: This breaks even more ifenslave buggy behavior.. wait for next comments.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Let's try everything again from the beggining but now with a fixed
ifupdown version (no race conditions between upstart and sysv scripts
). My interfaces file will be exactly the same as the one proposed for
ifenslave examples:

---
auto eth1
iface eth1 inet manual
    bond-master bond0

auto eth2
iface eth2 inet manual
    bond-master bond0

auto bond0
iface bond0 inet static
    bond-mode 1
    bond-miimon 100
    bond-primary eth1 eth2
 address 192.168.169.1
 netmask 255.255.255.0
 broadcast 192.168.169.255
 ---

We do have bond0 created but still no bonding configured:

---
root@provisioned:~# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 62:64:29:45:df:ef
          BROADCAST MASTER MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

root@provisioned:~# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: load balancing (round-robin)
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
---

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Lets try adding "bond-slaves" to the master interface and fixing
the "bond-primary" keyword:

---
root@provisioned:~# cat /etc/network/interfaces
# /etc/network/interfaces

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet manual
    bond-master bond0

auto eth2
iface eth2 inet manual
    bond-master bond0

auto bond0
iface bond0 inet static
    bond-mode 1
    bond-miimon 100
    bond-primary eth1
    bond-slaves eth1 eth2
    address 192.168.169.1
    netmask 255.255.255.0
    broadcast 192.168.169.255
---

Still nothing...

---
root@provisioned:~# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 62:64:29:45:df:ef
          BROADCAST MASTER MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

root@provisioned:~# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: load balancing (round-robin)
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
---

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

And you can check that upstart got deadlocked:

---
root@provisioned:~# ps -ef | grep ifup
root 618 1 0 10:21 ? 00:00:00 ifup --allow auto eth2
root 619 1 0 10:21 ? 00:00:00 ifup --allow auto eth1
root 620 1 0 10:21 ? 00:00:00 ifup --allow auto lo
root 621 1 0 10:21 ? 00:00:00 ifup --allow auto eth0
root 726 1 0 10:21 ? 00:00:00 ifup --allow auto bond0
root 739 733 0 10:21 ? 00:00:00 ifup -a

root@provisioned:~# for i in `ps -ef | grep ifup | grep -v grep | awk '{print $2}'`; do echo $i; cat /proc/$i/environ; done
618
...UPSTART_INSTANCE=eth2...
...UPSTART_INSTANCE=eth1...
...UPSTART_INSTANCE=lo...
...UPSTART_INSTANCE=eth0...
...INSTANCE=UPSTART_JOB=networking...
---

As I said before, sysv scripts and upstart scripts were depending on
each other to run in parallel (unfortunately with race conditions)
to configure bonding. We can see here that one of upstart networking
processes (networking or network-instance) got the lock and is
on an infinite loop waiting for other instance.. who is waiting for
the lock.

---
root@provisioned:~# ps -ef | grep ifenslave
root 647 641 0 10:21 ? 00:00:00 /bin/sh /etc/network/if-pre-up.d/ifenslave
---

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

YES!

---
root@provisioned:~# pstree -a
...
├─ifup --allow auto eth2
│ └─sh -c run-parts /etc/network/if-pre-up.d
│ └─run-parts /etc/network/if-pre-up.d
│ └─ifenslave /etc/network/if-pre-up.d/ifenslave
│ └─sleep 0.1
---

One slave interface, eth2 in this case, got the ifupdown lock and is
running an infite loop waiting for the master bonding interface which
will never run without the lock.

Resuming:

So bonding had to have both networking scripts running (network-
interface and networking) to work AND having both scripts running
would case race conditions sometime. Disabling one of the scripts
would also cause race condition if right triggers are set (like i
showed in this example). Fixing ifupdown race conditions led me to
realize ifenslave is taking wrong decisions and can cause deadlocks.

Ifenslave must be fixed together...

* wait for next comments.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Checking Ubuntu bzr tree...

---
<email address hidden>:/bugs/00064811/sources/bazaar/ubuntu/$ git clone "bzr::lp:ubuntu/ifenslave"
Cloning into 'ifenslave'...
Most recent Ubuntu version: 3
Packaging branch version: 2.5ubuntu1
Packaging branch status: OUT-OF-DATE
Most recent Ubuntu version: 3
Packaging branch version: 2.5ubuntu1
Packaging branch status: OUT-OF-DATE
Most recent Ubuntu version: 3
Packaging branch version: 2.5ubuntu1
Packaging branch status: OUT-OF-DATE
Most recent Ubuntu version: 3
Packaging branch version: 2.5ubuntu1
Packaging branch status: OUT-OF-DATE
Checking connectivity... done.
---
---
<email address hidden>:/bugs/00064811/sources/bazaar/ubuntu/ifenslave$ git tag | grep -e 2.4 -e 2.5
2.4
2.4ubuntu1
2.5
2.5ubuntu1
---
---
<email address hidden>:/bugs/00064811/sources/bazaar/ubuntu/ifenslave$ git checkout 2.4
Previous HEAD position was 64392a5... Re-apply Ubuntu delta to new source.
HEAD is now at 1d22c9b... Added "ifenslave-2.6.prerm" to remove dangling alternatives (Closes: #736668). Thanks to Andreas Beckmann.
<email address hidden>:/bugs/00064811/sources/bazaar/ubuntu/ifenslave$ git checkout 2.4ubuntu1
Previous HEAD position was 1d22c9b... Added "ifenslave-2.6.prerm" to remove dangling alternatives (Closes: #736668). Thanks to Andreas Beckmann.
HEAD is now at 64392a5... Re-apply Ubuntu delta to new source.
---
---
<email address hidden>:/bugs/00064811/sources/bazaar/ubuntu/ifenslave$ git checkout 2.5
Previous HEAD position was 64392a5... Re-apply Ubuntu delta to new source.
HEAD is now at 1701e16... * "ifupdown (>= 0.7.46)" compatibility update (Closes: #742410). Thanks to Andrew Shadura. * Added versioned Depends on "ifupdown (>= 0.7.46)".
<email address hidden>:/bugs/00064811/sources/bazaar/ubuntu/ifenslave$ git checkout 2.5ubuntu1
Previous HEAD position was 1701e16... * "ifupdown (>= 0.7.46)" compatibility update (Closes: #742410). Thanks to Andrew Shadura. * Added versioned Depends on "ifupdown (>= 0.7.46)".
HEAD is now at e47d568... * Merge from Debian unstable. Remaining changes: - Upstart event based bond bringup: + Drop ethernet+wifi example + Drop two_ethernet example + Update ethernet+hotplug_wifi example + Update two_hotplug_ethernet example + Update pre-up and post-down scripts for event bringup + Update README.Debian examples - Update scripts to use /run/network/ifstate instead of /etc/network/run/ifstate
---

I could see that we diverged from upstream code (Debian's) in favor of some other modifications.
Ubuntu fix would be different then a possible upstream fix.

We have nowadays:

<email address hidden>:/bugs/00064811/sources/trusty/ifenslave$ rmadison ifenslave
 ifenslave | 2.4ubuntu1 | trusty | source, all
 ifenslave | 2.5ubuntu1 | utopic | source, all

Both need fixes for this particular case.

* wait for next comments.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

After talking to Stéphane Graber, from Ubuntu Core Foundations Team, we decided that I should implement independent locking for every interface (like I have already proposed to Debian upstream project) and to implement locking mechanisms for dependent interfaces inside the hooks.

So:

1) ifupdown would lock every given interface (or all if "-a" is given).

2) locking for child interfaces (slaves for bonding, attached to bridges, ...) is going to be done inside hooks. Today most important hooks for ifupdown are: bridging, vlan and bonding. I have to guarantee those 3 are ok with any change made to ifupdown tool.

* wait for next comments/patches/actions.

Revision history for this message
Iain Lane (laney) wrote :

(unsubscribing ~ubuntu-sponsors per comment #13, please re-subscribe when patches are ready)

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Discussing this with Foundations we concluded ifupdown should not only lock "per-interface" basis, but it should have also a way of creating an hierarchy of interfaces (which locking the master one would imply in all slaves to be locked also - for vlan, aliases, bridging, etc) so in a possible parallel execution ifupdown would obey those restrictions and configure interfaces in a proper order - guaranteeing locking.

I'm preparing those changes and I'll suggest them upstream. If they get accepted I'll provide SRUs for precise and trusty. If SRUs or upstream code proposal are not accepted I may created a parallel ifupdown package being maintained by me to address those issues.

Thank you.. Coming back to this soon.

tags: added: cts
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Im getting back to this after sometime. After the discussion was brought to upstream we did not get feedback regarding proposed changes but investigating further it is clear that ifupdown is suffering from race conditions that cannot be solved simply by creating:

1) big lock - since its ifup/ifdown/ifquery are reentrant*
2) big lock - does not attend to interface order/priority for parallel executions**
3) fine-grained lock - does not attend interface order/priority for parallel executions**

* could be solved by ENV variable being set not to lock childs) by up/down scripts.
** group of interfaces such as "bridges" and all interfaces connected to it, interfaces and all vlans connected to it

Final approach here will be to guarantee:

1) interfaces should be locked independently on executions
2) locks have to respect interface hierarchy (locking group of inter-connected interfaces such as bridges/interfaces, interfaces/vlans)
3) all up/down scripts have to be reviewed after any locking mechanism change (deadlock by reentrant calls)

IMO

1) stanzas should be created to "group" interfaces to be locked (for parallel executions) respecting hierarchy/order between them
2) locking/state have to be together and independent

FINALLY

The change to guarantee all that will involve code AND interfaces file change (for adding special stanzas to make sure appropriate order and locking is done during interfaces activation). It is not clear if this change will be smooth enough for a "stable release update". If not I'll try to provide a PPA to address any needed code-change for those who suffer from this issue.

BY NOW

The only way to guarantee interfaces activation ordering (without suffering from intermittent race conditions like the one explained on this bug) would be to activate interface one by one outside sysv/upstart scripts OR to use "pre/post" commands with reentrant calls to ifupdown based on the desired order.

Any comments here are much appreciated.

Thank you

Rafael Tinoco

Changed in ifupdown (Ubuntu):
assignee: Rafael David Tinoco (inaddy) → nobody
Changed in ifupdown (Ubuntu):
assignee: nobody → Dariusz Gadomski (dgadomski)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ifupdown (Ubuntu Precise):
status: New → Confirmed
Changed in ifupdown (Ubuntu Trusty):
status: New → Confirmed
Changed in ifupdown (Ubuntu Vivid):
status: New → Confirmed
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Adding SRU proposal for wily.

description: updated
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Adding SRU proposal for Vivid.

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Adding SRU proposal for Trusty.

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Adding SRU proposal for Trusty (to make ifenslave compatible with ifupdown changes).

description: updated
Mathew Hodson (mhodson)
Changed in ifupdown (Ubuntu):
importance: Undecided → Medium
Changed in ifupdown (Ubuntu Precise):
importance: Undecided → Medium
Changed in ifupdown (Ubuntu Trusty):
importance: Undecided → Medium
Changed in ifupdown (Ubuntu Vivid):
importance: Undecided → Medium
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Adding SRU proposal for Xenial.

Revision history for this message
Martin Pitt (pitti) wrote :

Sponsored the patch for xenial. Let's give this some maturing there first.

Changed in ifupdown (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Sebastien Bacher (seb128) wrote :

(unsubscribing sponsors for now then, please subscribe them back after getting some feedback from the xenial update)

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.7.54ubuntu2

---------------
ifupdown (0.7.54ubuntu2) xenial; urgency=medium

  * Per-interface hierarchical locking. Backported from Debian git head.
    (LP: #1337873, Closes: #753755)

 -- Dariusz Gadomski <email address hidden> Thu, 10 Nov 2015 11:30:14 +0200

Changed in ifupdown (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

I sponsored the trusty and wily patches.

Changed in ifupdown (Ubuntu Vivid):
status: Confirmed → Won't Fix
Changed in ifupdown (Ubuntu Wily):
status: New → In Progress
Changed in ifupdown (Ubuntu Trusty):
status: Confirmed → In Progress
Changed in ifenslave (Ubuntu Wily):
status: New → Fix Released
Changed in ifenslave (Ubuntu Vivid):
status: New → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Setting precise tasks to "wontfix", this is too complex to backport and the bug is not nearly important enough to risk regressions due to too invasive backports.

Changed in ifenslave (Ubuntu Trusty):
status: New → In Progress
Changed in ifenslave (Ubuntu Precise):
status: New → Won't Fix
Changed in ifupdown (Ubuntu Precise):
status: Confirmed → Won't Fix
Changed in ifenslave (Ubuntu):
status: New → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Rafael, or anyone else affected,

Accepted ifupdown into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ifupdown/0.7.47.2ubuntu4.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ifupdown (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Rafael, or anyone else affected,

Accepted ifenslave into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ifenslave/2.4ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ifenslave (Ubuntu Trusty):
status: In Progress → Fix Committed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Rafael, or anyone else affected,

Accepted ifupdown into wily-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ifupdown/0.7.54ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ifupdown (Ubuntu Wily):
status: In Progress → Fix Committed
Mathew Hodson (mhodson)
Changed in ifupdown (Ubuntu Wily):
importance: Undecided → Medium
Changed in ifenslave (Ubuntu):
importance: Undecided → Medium
Changed in ifenslave (Ubuntu Precise):
importance: Undecided → Medium
Changed in ifenslave (Ubuntu Trusty):
importance: Undecided → Medium
Changed in ifenslave (Ubuntu Vivid):
importance: Undecided → Medium
Changed in ifenslave (Ubuntu Wily):
importance: Undecided → Medium
Revision history for this message
Dariusz Gadomski (dgadomski) wrote : Re: Precise, Trusty, Utopic - ifupdown initialization problems caused by race condition

I have verified both Trusty and Wily. The verification was automated cyclic rebooting of a VM containing 3 NICs - 2 of them were used in bonding in active-backup. Before the fix has been implemented this test failed with some interfaces uninitialized or the bonding mode being wrong (the default round-robin was set).

This time, with the -proposed versions, after over 48 hours of the test none of the symptoms occurred.

Thus, tagging as verified.

tags: added: sts verification-done
removed: cts verification-needed
Revision history for this message
Martin Pitt (pitti) wrote :

Since per-interface locking landed in Xenial, we've been getting crashes, see bug 1532722. Until this is fixed, I'm marking this as v-failed. We'll then need to update the SRU with this fix as well.

tags: added: verification-failed
removed: verification-done
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

SRU proposal for Trusty (extended with fix to bug #1532722)

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

SRU proposal for Wily (extended with fix to bug #1532722)

Martin Pitt (pitti)
summary: - Precise, Trusty, Utopic - ifupdown initialization problems caused by
- race condition
+ ifupdown initialization problems caused by race condition
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Updated SRU proposal for Trusty (fix to bug #1532722)

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

New SRU proposal for Wily (with fix to bug #1532722)

Revision history for this message
Martin Pitt (pitti) wrote :

I sponsored the updated trusty/wily patches, thanks!

Changed in ifupdown (Ubuntu Trusty):
status: Fix Committed → In Progress
Changed in ifupdown (Ubuntu Wily):
status: Fix Committed → In Progress
Adam Conrad (adconrad)
Changed in ifupdown (Ubuntu Trusty):
status: In Progress → Fix Committed
Changed in ifupdown (Ubuntu Wily):
status: In Progress → Fix Committed
tags: added: verification-needed
removed: verification-failed
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Since I've added the fix to bug 1532722 and after several days of testing I did not observe any other issues on Trusty and Wily I'm tagging this as verified.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for ifenslave has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifenslave - 2.4ubuntu1.2

---------------
ifenslave (2.4ubuntu1.2) trusty; urgency=medium

  * Don't depend on /run/network/ifstate. (LP: #1337873)

 -- Dariusz Gadomski <email address hidden> Thu, 01 Oct 2015 11:30:24 +0200

Changed in ifenslave (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.7.47.2ubuntu4.3

---------------
ifupdown (0.7.47.2ubuntu4.3) trusty; urgency=medium

  [ Martin Pitt ]
  * Fix ifquery crash if interface state file does not exist yet.
    (Closes: #810779, LP: #1532722)

 -- Dariusz Gadomski <email address hidden> Tue, 12 Jan 2016 11:05:16 +0100

Changed in ifupdown (Ubuntu Trusty):
status: Fix Committed → Fix Released
Martin Pitt (pitti)
Changed in ifupdown (Ubuntu Wily):
status: Fix Committed → Fix Released
Revision history for this message
Max Krasilnikov (pseudo) wrote :
Download full text (4.1 KiB)

Hello!

I am running Ubuntu 14.04.3 LTS.

This update introduces problem in my setup, adapted to old behavior:
auto eth2
iface eth2 inet manual
        bond-master bond0
        up ip link set $IFACE txqueuelen 10000

auto eth3
iface eth3 inet manual
        bond-master bond0
        up ip link set $IFACE txqueuelen 10000

auto bond0
iface bond0 inet static
        address 10.0.66.3
        netmask 255.255.255.0
        bond-mode 802.3ad
        bond-lacp-rate 1
        bond-slaves none
        pre-ip ifup eth2
        pre-up ifup eth3
        up ip link set $IFACE txqueuelen 10000

Interface bond0 is not becoming up:

root@storage003:~# ps axu |grep ifup
root 780 0.0 0.0 4392 1448 ? Ss 00:03 0:00 ifup --allow auto eth3
root 783 0.0 0.0 4392 1460 ? Ss 00:03 0:00 ifup --allow auto eth2
root 1067 0.0 0.0 4392 1516 ? Ss 00:03 0:00 ifup --allow auto bond0
root 1087 0.0 0.0 4448 668 ? S 00:03 0:00 /bin/sh -c ifup eth3
root 1088 0.0 0.0 4388 1344 ? S 00:03 0:00 ifup eth3
root 2150 0.0 0.0 4388 1548 ? S 00:03 0:00 ifup -a

root@storage003:~# ps axu |grep ifenslave
root 816 0.1 0.0 4448 1436 ? S 00:03 0:48 /bin/sh /etc/network/if-pre-up.d/ifenslave
root 817 0.1 0.0 4448 1504 ? S 00:03 0:48 /bin/sh /etc/network/if-pre-up.d/ifenslave
root 1792323 0.0 0.0 15124 2136 pts/0 S+ 11:39 0:00 grep --color=auto ifenslave

root@storage003:~# for i in `ps -ef | grep ifup | grep -v grep | awk '{print $2}'`; do echo $i; cat /proc/$i/environ; done
780
ID_BUS=pciUPSTART_INSTANCE=eth3ACTION=addID_VENDOR_FROM_DATABASE=Intel CorporationSEQNUM=3065USEC_INITIALIZED=9113IFINDEX=4KERNEL=eth3DEVPATH=/devices/pci0000:00/0000:00:02.2/0000:03:00.0/net/eth3ID_OUI_FROM_DATABASE=Intel CorporateUPSTART_JOB=network-interfaceTERM=linuxSUBSYSTEM=netID_MODEL_ID=0x154dPATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/binID_NET_NAME_MAC=enxa0369f209924ID_MODEL_FROM_DATABASE=10GbE 2P X520 AdapterUPSTART_EVENTS=net-device-addedINTERFACE=eth3PWD=/ID_VENDOR_ID=0x8086ID_NET_NAME_PATH=enp3s0f0ID_PCI_CLASS_FROM_DATABASE=Network controllerID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller783
ID_BUS=pciUPSTART_INSTANCE=eth2ACTION=addID_VENDOR_FROM_DATABASE=Intel CorporationSEQNUM=3067USEC_INITIALIZED=9131IFINDEX=5KERNEL=eth2DEVPATH=/devices/pci0000:00/0000:00:02.2/0000:03:00.1/net/eth2ID_OUI_FROM_DATABASE=Intel CorporateUPSTART_JOB=network-interfaceTERM=linuxSUBSYSTEM=netID_MODEL_ID=0x154dPATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/binID_NET_NAME_MAC=enxa0369f209926ID_MODEL_FROM_DATABASE=10GbE 2P X520 AdapterUPSTART_EVENTS=net-device-addedINTERFACE=eth2PWD=/ID_VENDOR_ID=0x8086ID_NET_NAME_PATH=enp3s0f1ID_PCI_CLASS_FROM_DATABASE=Network controllerID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller1067
UPSTART_INSTANCE=bond0ACTION=addSEQNUM=3979USEC_INITIALIZED=5735IFINDEX=6KERNEL=bond0DEVPATH=/devices/virtual/net/bond0UPSTART_JOB=network-interfaceTERM=linuxSUBSYSTEM=netPATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/binUPSTART_EVENTS=net-device-addedINT...

Read more...

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Hello Max,

My guess (supported by a test I made in a test environment) is the cause of the problem are those lines under iface bond0:
        pre-ip ifup eth2
        pre-up ifup eth3

Those are most probably causing a deadlock, since the new release aims to fix the race condition causing the original issue (described above).
Removing those lines (and hence following the convention described in /usr/share/doc/ifenslave/README.Debian.gz) will make your configuration compatible with the supported (and documented) convention.

In your case ifupdown will be responsible for bringing eth2 and eth3 devices while setting up bond0, so you don't need to undertake any additional actions in the bond0 section - please depend on this.

Revision history for this message
Max Krasilnikov (pseudo) wrote :

Thanx a lot, all is working now. My bad.

Revision history for this message
Swe W Aung (sirswa) wrote :

Hi all

I upgraded ifenslave and ifupdown to 2.4ubuntu1.2 and 0.7.47.2ubuntu4.3 respectively. After reboot, the bonding did not come up correctly; mtu were set wrongly (default to 1500), default gateway was not set, nameserver information were not set in /etc/resolv.conf

After downgrading ifupdown to 0.7.47.2ubuntu4 and rebooted the server, everything came up fine again.

root@rcstodc1r24-01-ac:/etc/network# apt-cache policy ifenslave
ifenslave:
  Installed: 2.4ubuntu1.2
  Candidate: 2.4ubuntu1.2

root@rcstodc1r24-01-ac:/etc/network# apt-cache policy ifupdown
ifupdown:
  Installed: 0.7.47.2ubuntu4
  Candidate: 0.7.47.2ubuntu4.3

root@rcstodc1r24-01-ac:/etc/network# uname -a
Linux rcstodc1r24-01-ac 3.13.0-45-generic #74-Ubuntu SMP Tue Jan 13 19:36:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
root@rcstodc1r24-01-ac:/etc/network# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS"

/etc/network/interface

<snip>
auto p5p1
iface p5p1 inet manual
mtu 9000

auto p5p2
iface p5p2 inet manual
mtu 9000

auto p5p1.104
iface p5p1.104 inet manual
bond-master bond104
bond-primary p5p1.104
mtu 9000

auto p5p2.104
iface p5p2.104 inet manual
bond-master bond104
mtu 9000

auto bond104
iface bond104 inet static
address X.X.X.X
netmask 255.255.248.0
network X.X.X.X
broadcast X.X.X.X
gateway X.X.X.X
dns-nameservers X.X.X.X
dns-search erc.monash.edu.au
bond-miimon 100
bond-mode 1
mtu 9000
bond-primary p5p1.104
bond-slaves none

Since the server is in storage cluster server pool and could not hold on to it for long. I downgraded the ifupdown package and joined the production pool.

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

@sirswa I tried to reproduce this in my environment - unsuccessfully.

I created a similar config - please take a look at my it for reference: http://paste.ubuntu.com/15129899/
Maybe you could spot a difference that I overlooked.

This could mean that this change may be interfering with something in your system we did not take under consideration.

Could you provide the output of:
service --status-all

and
find /etc/network
(to see what if-*.d scripts are you running).

Thanks!

Revision history for this message
Swe W Aung (sirswa) wrote :

Hi Dariusz

Thanks for looking into it. The config looks like mine.
p3p1 interfaces are Mellanox CX PCI cards and could it be due to modules are not loaded yet when bonding starts?

I have upgraded ifupdown to 0.7.47.2ubuntu4.3 version to get more logs. I have attached

service --status-all output
apt-cache for ifenslave and ifupdown
dmesg

http://paste.ubuntu.com/15175284/

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Thank you for the report sirswa.

I have analyzed your config and came to some conclusions. You may want to consider using one of the approaches below:
* giving up on bonding and replacing it with bridging in STP mode (please consult the man pages at http://manpages.ubuntu.com/cgi-bin/search.py?q=brctl)
* implementing your VLANs on top of the bonding interfaces instead of physical interfaces (i.e. defining bond104.104, bond944.944 and bond945.945 istead of p5p1.104, p5p1.944 etc.). The configuration you were using places the VLAN layer below the bonding layer and could produce unexpected behaviour. Please use the approach described here as reference: https://www.stgraber.org/2012/01/04/networking-in-ubuntu-12-04-lts/
https://www.kernel.org/doc/Documentation/networking/bonding.txt

I am aware that the configuration you are using was working before, but despite this fact it was never supported. The latest changes made to ifupdown just exposed that fact.

Changed in ifupdown (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.