Comment 2 for bug 1926139

Revision history for this message
Chris Patterson (cjp256) wrote : Re: dhclient doesn't receive dhcp offer from kernel

We've been investigating a similar issue in Ubuntu 20.04 (and now 22.04) on Azure where Running PPS re-use fails to perform DHCP for 5 minutes when dhclient is invoked by cloud-init. dhclient is run by cloud-init, but sees no DHCPOFFER. It varies due to unknown reasons but it has affected a ~0.3-2% of deployments in this scenario over time.

We instrumented our images to capture network traffic and see what is happening and sure enough DHCP offers are coming through to the guest by dhclient doesn't see them. We instrumented dhclient and the "got_one()" callback is never invoked in these failures.

18.04 does not have this issue.

This behavior can be reproduced multiple ways:
- Reproduce similar test environment to above scenario using cloud-init (switch hyperv nic to a different vnet while waiting the link status to reset, then perform dhcp). This test case will reproduce in ~1,500 runs, though it varies and requires more complex setup.
- Repeatedly run dhclient in a loop until it fails (see test-sequential.sh). It may take a while, but even this simple test will reproduce this behavior in ~50k runs for me in an LXD VM.
- Simply launch instances of dhclient in parallel (see test-parallel.sh). There is an excellent chance at least one of those dhclients will fail this way.

I noticed the uprev of bind9 libs in focal:
focal (net): 1:9.11.16+dfsg-3~build1
focal-updates (net): 1:9.11.16+dfsg-3~ubuntu1
impish (net): 1:9.11.19+dfsg-2.1ubuntu1
jammy (net): 1:9.11.19+dfsg-2.1ubuntu3
kinetic (net): 1:9.11.19+dfsg-2.1ubuntu3

I couldn't find any related issue on the isc-dhcp tracker, etc. I did build dhclient from the Debian master branch (https://salsa.debian.org/debian/isc-dhcp/-/commits/master/debian) which uses the in-tree bind libs and that seems to have addressed the issue for all scenarios. Not that it helps much to bisect this just yet.