Comment 18 for bug 1947099

Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

Hi Robie,

The user story here is about improving the time it takes to boot a Bionic instance on Oracle Cloud in a specific bare metal shape, called BM.GPU4.8. This is a pretty large instance, with 18x Ethernet controller [0200]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [15b3:1019]:

https://www.oracle.com/uk/cloud/compute/gpu.html

Comparing the time it takes to boot Bionic with a CentOS instance we can see a big delta both in kernel and userspace:

CentOS:
Startup finished in 4min 3.548s (firmware) + 2min 23.525s (loader) + 3min 17.984s (kernel) + 1min 37.825s (initrd) + 38.723s (userspace) = 12min 1.608s
multi-user.target reached after 38.714s in userspace

Ubuntu:
Startup finished in 4min 621ms (firmware) + 2min 23.174s (loader) + 7min 34.767s (kernel) + 5min 48.219s (userspace) = 19min 46.782s
graphical.target reached after 5min 47.854s in userspace

The userspace difference is a cloud-init problem that is being handled by other launchpad bugs. Here we are trying to focus on the kernel difference, which further was narrowed down to be related to klibc and it is what we are trying to address with this bug. Some details:

Looking into details, here are the dmesg output from CentOS and Ubuntu, for kernel comparison:

Ubuntu: https://pastebin.canonical.com/p/X7GFVjdV3b/
CentOS: https://pastebin.canonical.com/p/gNhFwJjyjw/

Looking there, one area that looks promising is around mlx5_core driver, where Ubuntu spends 218 seconds, while CentOS spends 44 seconds

Khalid did a nice job investigating and found that the problem came from ipconfig from initramfs. Here are the details:

The delay from the kernel side is due to the fact that every interface (18 in total) takes over 10 seconds each to initialize.

The interfaces are brought up and used for DHCP as part of the iscsi initialization, during the initramfs stage. Specifically the script ```/usr/share/initramfs-tools/scripts/local-top/iscsi``` which is included in the initramfs and is installed as part of open-iscsi

To initialize the network interfaces this script calls other helper functions that are part of the initramfs, specifically "configure_networking" in ```/usr/share/initramfs-tools/scripts/functions``` which is installed as part of initramfs-tools-core.

The configure_networking function initializes networking by looping over all interfaces and issuing this call:

```
ipconfig -t 2 <iface>
```

where it is calling the ipconfig utility (part of klibc-utils) which also gets bundled in the initramfs. The -t 2 specifies a timeout of 2 seconds.

This ipconfig tool attempts to perform DHCP on the given interface, within the specified timeout (always 2 seconds for each interface)

Normally, ipconfig does not perform only one DHCP attempt - it performs multiple DHCP attempts, each attempt waiting for an exponentially increasing amount of time (1s, 2s, 4s, ..etc) up until the total amount of time has passed that is equal the user-specified timeout (2 in this case, so it should wait 1 second, followed by 1 second, for a total of 2 seconds).

However, it appears ipconfig has a bug. In case it encounters an error sending a DHCP request on an interface (which is the case for 17 out of the 18 interfaces on this instance), it attempts to "try again later" and sets a timer of 10 seconds for the next attempt - ignoring completely the user-specified value (2 seconds). I believe this is a bug in ipconfig and have a working fix to limit the delay to the user-specified timeout so each interface actually takes a maximum of 2 seconds for initialization, even in error cases.

The above is for bionic, which has the reported problem. In focal, even though the kernel and klibc-utils are identical to bionic, the version of initramfs-tools-core is different. In the focal version, configure_networking() has been reworked to avoid using ipconfig entirely. Instead, it uses ```dhclient```. And instead of doing that 18 times, dhclient is called only once with the full list of interfaces, as such:

```
dhclient -v -1 -cf /etc/dhcp/dhclient.conf.2 -pf /tmp/dhclient.pid.2 -4 enp12s0f0np0 enp12s0f1np1 enp138s0f0np0 enp138s0f1np1 enp148s0f0np0 enp148s0f1np1 enp195s0f0np0 enp195s0f1np1 enp209s0f0np0 enp209s0f1np1 enp22s0f0np0 enp22s0f1np1 enp45s0f0np0 enp45s0f1np1 enp72s0f0np0 enp72s0f1np1 enp76s0f0np0 enp76s0f1np1
```

dhclient does DHCP on all those interfaces at the same time, asynchronously, and it daemonizes (goes in the background) as soon as it gets a DHCP response which happens relatively quickly (on interface enp45s0f0np0 usually) and so the boot-up continues *much* faster compared to bionic (180+ seconds are reduced to just 5-10 seconds).

I'm sure there might be other use cases, but this is what we have been reported and are trying to work with. I hope this helps clarifying.