machine-id is not reset when instance-id changes

Bug #2003121 reported by Robie Basak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Won't Fix
Undecided
Unassigned

Bug Description

As discussed in #ubuntu-server just now, it's expected that cloud-init will ensure that machine-id is not carried over when a VM is cloned and this is detectable by an instance-id change.

This would align behaviour with ssh host key regeneration behaviour.

Actual behaviour: currently if a VM is cloned and the instance-id changes, /etc/machine-id remains the same.

Revision history for this message
Robie Basak (racb) wrote :

While experimenting with this, I found that systemd-networkd uses /etc/machine-id to determine the DHCP client identifier, and dnsmasq reissues the same lease if the client identifier is the same. So starting two cloud images using libvirt with its dnsmasq DHCP support from the same "golden image", without cloud-init resetting /etc/machine-id, results in an IP conflict between those two VMs.

Revision history for this message
Brett Holman (holmanb) wrote :

Agreed, automating this boot-time step seems ideal from an user experience and identity correctness perspective.

Resetting machine-id is currently expected to be done by the image builder at build time. Taking responsibility for this behavior at runtime carries risk that will need to be evaluated and mitigated prior to introduction. This would require all systemd services that use machine-id to be ordered after (or potentially restarted after, if already started) whichever cloud-init service would be responsible for this behavior.

If this behavior is expected to be default in upstream cloud-init, risk is multiplied across distros, since each distro may have different services and ordering.

Also note that resetting machine-id at runtime may cause a slower boot by forcing delayed ordering of services.

Revision history for this message
Brett Holman (holmanb) wrote (last edit ):

Resetting machine-id at runtime would be a pretty big break from current expectations, and correct implementation would require foreknowledge of services using machine-id that are provided in an image. The potential for bugs due to implementation complexity, potential for boot speed regression caused by services delaying until after machine-id is reset, and expected future burden of such a feature due to changes in services and variation in Ubuntu and other distros makes the perceived risk of this feature outweigh the benefit. These complexity, risk, and potential boot speed issues are not present when machine-id is correctly set at boot time, so I'm hesitant to move forward with this request.

I'll mark this "Won't Fix" for now.

In the meantime, I'd like to point users experiencing the same issue towards our build recommendation[1], specifically the --machine-id option.

[1] https://cloudinit.readthedocs.io/en/latest/reference/cli.html#clean

Changed in cloud-init:
status: New → Won't Fix
Revision history for this message
Chad Smith (chad.smith) wrote :

"it's expected that cloud-init will ensure that machine-id is not carried over when a VM is cloned and this is detectable by an instance-id change."

I'm not sure that statement above is wholly correct.

The instance-id delta is triggered in more cases than just a clone and first instance boot event.

In recent history ~5 years, some clouds trigger instance-id changes for the following events to force cloud-init to reperform all configuration on next boot (or sometimes hotplug NIC configuration):
 - network configuration changes, NIC add/remove
 - user-data changes or vendor-data changes
 - vm clone and cloned image relaunch

Here is systemd's documented stance on machine-id changes per man machine-id:

The machine ID does not change based on local or network configuration
       or when hardware is replaced. Due to this and its greater length, it is
       a more useful replacement for the gethostid(3) call that POSIX
       specifies.

Trying to fold /etc/machine-id regeneration into every instance-id change for cloud-init will be tough to support until we have:
  1. cloud-init grow smarts to perform a comparison of previous cached instance data versus current metadata from the cloud's instance metadata service to determine whether the scope config changes are limited to just network or storage to avoid regenerating the machine-id unnecessarily 2. an assurance that systemd and systemd-networkd can react appropriately to an updated machine-id on the booting system after networkd is already active comes up

The reason for #2 is because cloud-init is only able to detect instance metadata after the network is already active on the system, and restarting systemd-networkd later in boot is more likely to expose a number of other racey problems.

We may take a look at this further, but the conditions under which we want cloud-init to magically regenerate /etc/machine-id and cope with systemd ordering/costs would need to be limited in scope to avoid triggering other concerns.

Revision history for this message
Brett Holman (holmanb) wrote (last edit ):

A couple of related details regarding machine-id induced IP collisions:

The duplicate IP caused by duplicate machine-id will not happen with NetworkManager in Focal and later (NetworkManager versions >1.15) by default due to this change[1]. It is still possible to trigger it by setting `ipv4.dhcp-client-id=duid`.

SystemD is unlikely to follow NetworkManager[2], because doing so would only mask the bigger problem (duplicate machine-id on multiple machines).

Therefore, the duplicate IP symptom is limited to distros using systemd-networkd, however the underlying machine-id issue affects all distros.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/cfd696cc3cf43f5f510046b757949546bcee4cdc
[2] https://github.com/systemd/systemd/issues/9609#issuecomment-776277655

Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.