cloud-init no longer sets hostname on first boot

Bug #1991261 reported by Dave Jones
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

During ISO testing of the beta Ubuntu Server 22.10 for Raspberry Pi image, on a Raspberry Pi 4B and 3B, I noted that cloud-init is no longer setting the hostname on the first boot. It *does* correctly output the requested value to /etc/hostname and on subsequent boots the hostname is correct, but it's stuck at the default "ubuntu" for the very first boot (which isn't ideal if the box is headless).

This was working correctly with the jammy images, so it would appear the regression occurred somewhere between 22.2 and 22.3.3.

Dave Jones (waveform)
tags: added: raspi-image rls-kk-incoming
Revision history for this message
Dave Jones (waveform) wrote :

Updated description after confirming issue also exists on arm64 images.

description: updated
Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
https://iso.qa.ubuntu.com/qatracker/reports/bugs/1991261

tags: added: iso-testing
Revision history for this message
Chad Smith (chad.smith) wrote :

Thanks for this bug Dave and making cloud-init better.
If possible please attach the tar.gz from `sudo cloud-init collect-logs` for better triage about this regression.

There are two changes that landed in cloud-init that could potentially affect hostname setting in this release:
- [1] one change altered the frequency of cc_hostname module setting behavior to PER_INSTANCE from PER_ALWAYS, so if the install process on a RaspPi involves multiple boots and instance-id in metadata doesn't change indicating a metadata change, cloud-init won't apply that hostname correctly
- [2] one change altered certain scenarios when hostname updates are ignored

References:
- [1] https://github.com/canonical/cloud-init/pull/1651
- [2] https://github.com/canonical/cloud-init/pull/1453

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Revision history for this message
Chad Smith (chad.smith) wrote :

Marking incomplete temporarily while we await cloud-init logs. please set back to New once logs are attached so we get back on this to understand the failure mode. Thanks again

Revision history for this message
Dave Jones (waveform) wrote :

Attaching requested logs. Some additional details for reference:

The Pi Server images are pre-installed, so there are no reboots during installation (because there is no "installation" as such; cloud-init just goes through the motions as it does on regular cloud images, but with a seed on the FAT boot partition).

The hostname *is* being set, and is set on first boot but it appears to be happening "later" than it used to (I'm guessing after DHCP has started on the interface). In other words, on first login, the hostname reported (from /etc/hostname) is correct (in the attached logs you'll see it's "miss-piggy" -- all my Pis get muppet names). However, the machine's name on the network (derived from the DHCP client id) is still "ubuntu". On subsequent reboots it's all correct.

Changed in cloud-init (Ubuntu):
status: Incomplete → New
Revision history for this message
Chad Smith (chad.smith) wrote (last edit ):

Dave thanks again! What isn't attached is user-data that was provided at launch. If you get a chance to `sudo cloud-init query userdata` and see if that userdata is clean to attach that'd be great
   [NOTE] make sure to scrub miss-piggy's password or any other sensitive data before attaching

 What I see on the instance data in that NoCloud datasource config as seen from the mounted from
  "subplatform": "config-disk (/dev/mmcblk0p1)".

The metadata file exposed to cloud-init has only

  "meta_data": {
   "dsmode": "net",
   "instance-id": "nocloud",
   "instance_id": "cloud-image"
  }

And for some reason the local-hostname determined on this system is
  "local_hostname": "ubuntu" instead of what I expect would have been miss-piggy.

This can be seen with `sudo cloud-init query --all` on this system or /run/cloud-init/instance-data-sensitive.json

I'd generally expect either config drive meta-data to specify "local-hostname": "miss-piggy" or your user-data during launch.

The reason I'd also like to get a look at user-data is because cloud-init is also giving some warnings about deprecation of certain #cloud-config formatting choices, that it's probably best we also sort in Raspberry Pi setup/test infrastructure. Eventually those deprecations will break your test harness.

Sep 29 18:27:11.316479 ubuntu cloud-init[793]: chpasswd.list: DEPRECATED: List of ``username:password`` pairs. Each user will have the corresponding password set. A password can be randomly generated by specifying ``RANDOM`` or ``R`` as a user's password. A hashed password, created by a tool like ``mkpasswd``, can be specified. A regex (``r'\$(1|2a|2y|5|6)(\$.+){2}'``) is used to determine if a password value should be treated as a hash.
Sep 29 18:27:11.316479 ubuntu cloud-init[793]: Use of a multiline string for this field is DEPRECATED and will result in an error in a future version of cloud-init.
Sep 29 18:27:12.909071 ubuntu kernel: EXT4-fs (mmcblk0p2): resizing filesystem from 931286 to 3823739 blocks

In this deployment, It looks like there is some logic in cloud-init that is incorrectly trying to review/setup hostname

2022-09-29 18:27:14,065 - cc_set_hostname.py[DEBUG]: Setting the hostname to ubuntu (miss-piggy)

Normally I'd expect to see miss-piggy (miss-piggy) in that log message as the format is:
Setting the hostname to <fqdn> (<hostname>)

Digging into why 'ubuntu' is being discovered in 22.3.3 as potential fqdn reference instead of miss-piggy, but your user-data will help shed light on that.

Revision history for this message
Chad Smith (chad.smith) wrote (last edit ):

<EDIT> The reason for "systemd[1] Hostname set to <ubuntu>" is general systemd behavior and expected when /etc/hostname is present with 'ubuntu', systemd will automatically set this hostname even before cloud-init init --local boot stage is entered.

=== original comment ====
I'm seeing in early boot before cloud-init even start and even before /etc/machine-id is created that systemd is getting involved in setting up the hostname.

Sep 29 18:27:02.142455 ubuntu systemd[1]: Hostname set to <ubuntu>.

Are we providing kernel cmdline params like: systemd.hostname= ?
https://www.freedesktop.org/software/systemd/man/hostname.html#Hostname%20semantics

This might be why cloud-init is tripping over host name setup, and systemd overriding what cloud-init sets up.

Revision history for this message
Dave Jones (waveform) wrote :

No problem; the user-data in use is as follows (I haven't bothered scrubbing anything because this is just the one I use for ephemeral testing; the password isn't remotely secure and isn't intended to be -- in fact it's public in my dotfiles repo somewhere!):

  #cloud-config

  hostname: miss-piggy

  chpasswd:
    expire: false
    list:
    - ubuntu:raspberry

  keyboard:
    model: pc105
    layout: gb
    options: ctrl:nocaps

  ssh_import_id:
  - lp:waveform

  apt:
    conf: |
      Acquire::http { Proxy "http://acng.waveform.org.uk:3142"; }

And the network-config is as follows (minus a ton of default comments):

  version: 2
  ethernets:
    eth0:
      dhcp4: true
      optional: true

I'd noticed the warnings about deprecation but they don't make much sense to me: "Use of a multiline string for this field is DEPRECATED" implies I'm using a multi-line string in chpasswd.list ... but it's a list, not a multi-line string.

On kernel command line parameters, the cmdline.txt on the boot partition contains:

  console=serial0,115200 dwc_otg.lpm_enable=0 console=tty1 root=LABEL=writable rootfstype=ext4 rootwait fixrtc quiet splash

This is the default on the Pi images. However, it does get manipulated by the bootloader, so this is what it looks like from /proc/cmdline after boot:

  coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_headphones=1 snd_bcm2835.enable_hdmi=1 bcm2708_fb.fbwidth=1920 bcm2708_fb.fbheight=1080 bcm2708_fb.fbswap=1 smsc95xx.macaddr=DC:A6:32:31:6F:7C vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000 console=ttyS0,115200 dwc_otg.lpm_enable=0 console=tty1 root=LABEL=writable rootfstype=ext4 rootwait fixrtc quiet splash

In any case, I'm pretty confident there's no hostname specified on the kernel command line.

Revision history for this message
Brett Holman (holmanb) wrote (last edit ):

Hi Dave,

The way I read the systemd logs you provided the system hostname appears to be getting set correctly[1] (e.g. I would expect the output of `hostname` to be miss-piggy on your system).

Can you please elaborate on how you see the name on the network?

Are you seeing this name by running `hostname`? Or some other method?

[1] The hostname switches to miss-piggy on line 674 in journal.txt and doesn't switch back for the rest of the file.

Brett Holman

Revision history for this message
Brett Holman (holmanb) wrote :

There are a couple different potential theories about what may have happened, but I think we may need more information to understand what went wrong. I set this to incomplete for now.

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Revision history for this message
Chad Smith (chad.smith) wrote (last edit ):

Dave,

  Thanks for the work here and answering the questions we have above.

I think I see the problem here:

 `hostname: miss-piggy` is only triggering in cloud-init's the init-network stage because it is in user-data and not meta-data. User-data is only processed and acted upon by cloud-init once we hit the "init-network" and the network is already configured and brought up by systemd-network or it's ilk. Because network is already by the time cloud-init sets the hostname, the initial DHCP request on eth0 with the host-name option already leaked 'ubuntu' to your DDNS setup.

If the hostname were set in init-local timeframe, it would be correctly configured before the first DHCP client request goes out on the network.

To have hostname set in init-local timeframe before initial network DHCP request, that data needs to be in meta-data as the `local-hostname` key instead of in user-data as the `hostname` config because your nocloud datasource is configured to run in init-network due to `dsmode: net` in meta-data.

I believe this behavior is the same on 22.2 as 22.3 for these settings, so I'm not sure how that managed to not cause the same problems on 22.2.

When I look at the metadata in your attached logs I see:

  "meta_data": {
   "dsmode": "net",
   "instance-id": "nocloud",
   "instance_id": "cloud-image"
  }

Minimally I think you can drop that "instance_id": "cloud-image" directive unless you have other uses for it as cloud-init only pays attention to "instance-id" (note the hyphen not underscore).

If we have control over that config-disk meta-data, and you know you need setting of hostname before initial DHCP request there are two options:

1. provide -"local-hostname": "miss-piggy" in meta-data not #cloud-config user-data

  "meta_data": {
   "dsmode": "net",
   "instance-id": "nocloud",
   "local-hostname": "miss-piggy"
  }

-- OR --

2. Set dsmode to "local" which would process all known user-data (your #cloud-config\nhostname: ) during init-local timeframe.

  "meta_data": {
   "dsmode": "local",
   "instance-id": "nocloud"
  }

Revision history for this message
Chad Smith (chad.smith) wrote (last edit ):

And per the schema warning, yes that message is too big to be helpful I think, we can probable distill it a bit more per specific use-cases. the point is the whole 'list' key is actually deprecated in favor of explicit password type definitions.

So instead of:

  chpasswd:
    expire: false
    list:
    - ubuntu:raspberry

cloud-init now wants:
```
  chpasswd:
     expire: false
     users:
      - name: ubuntu
         password: raspberry
         type: text
```

We'll sort the word-smithing a bit but the details are here
https://cloudinit.readthedocs.io/en/latest/topics/modules.html#set-passwords

Revision history for this message
Chad Smith (chad.smith) wrote :

And thanks on kernel cmdline not providing hostname. Agreed. The source of the ubuntu hostname just came from the original image containing /etc/hostname with "ubuntu" in it. Systemd helpfully setting hostname early in boot for us before cloud-init even gets involved, which we can see in journalctl -b 0 as

    systemd[1] Hostname set to <ubuntu>" prior to any cloud-init involvement.

Revision history for this message
Dave Jones (waveform) wrote :

In response to holmanb:

> Can you please elaborate on how you see the name on the network?
>
> Are you seeing this name by running `hostname`? Or some other
> method?

hostname returns miss-piggy (as expected), but the DHCP server
received "ubuntu" as the client id, and hence other machines see the
Pi as "ubuntu". This is different from prior releases where the Pi
would show up as "miss-piggy" to other machines (presumably, and this
is just speculation on my part, because the hostname was set "earlier"
before DHCP kicked off?).

In response to chad.smith:

> If the hostname were set in init-local timeframe, it would be
> correctly configured before the first DHCP client request goes out
> on the network.

> To have hostname set in init-local timeframe before initial network
> DHCP request, that data needs to be in meta-data as the
> `local-hostname` key instead of in user-data as the `hostname`
> config because your nocloud datasource is configured to run in
> init-network due to `dsmode: net` in meta-data.

Well, taking a look at the set-hostname module reference in the
cloud-init documentation [1]:

"This module *will run in the init-local stage* before networking is
configured if the hostname is set by metadata *or user data* on the
local system." (emphasis added)

[1]: https://cloudinit.readthedocs.io/en/latest/topics/modules.html#set-hostname

> I believe this behavior is the same on 22.2 as 22.3 for these
> settings, so I'm not sure how that managed to not cause the same
> problems on 22.2.

I can't be absolutely certain about the versions but I can say that
the jammy .1 and kinetic beta images reliably behave differently. I
flashed both several times yesterday and the jammy images happily DHCP
their expected hostname each time, while the kinetic images don't.

> When I look at the metadata in your attached logs I see:
>
> "meta_data": {
> "dsmode": "net",
> "instance-id": "nocloud",
> "instance_id": "cloud-image"
> }
[snip]
> 2. Set dsmode to "local" which would process all known user-data
> (your #cloud-config\nhostname: ) during init-local timeframe.

Ohhhh... now that's interesting! Why is dsmode "net". That should
almost certainly be "local" (for instance we have a 99-fake_cloud.cfg
in /etc/cloud/cloud.cfg.d that disables all data-sources except
nocloud, and redirects its seed to the partition labeled
"system-boot", which happens to be the FAT partition at the start of
the image).

I shall go and check if this has changed between jammy and kinetic...
Thanks for the hints!

Revision history for this message
Dave Jones (waveform) wrote :

Apparently dsmode="net" is the case on both the jammy .1 and the kinetic images.

Still, I can always try setting dsmode="local" in meta-data and see if that fixes things up; that would be a nice simple work-around as users could keep specifying their configuration in user-data, and we'd just need one extra line in our (currently extremely minimal) meta-data.

Currently, our meta-data file consists solely of:

  instance_id: cloud-image

I have some vague recollection of trying a completely blank file and it causing issues, but that's probably a long time ago now!

Revision history for this message
Dave Jones (waveform) wrote :

Good news! Adding dsmode: local to meta-data appears to fix things nicely. A couple of questions spring to mind though, particularly since I can't seem to find much documented about "dsmode" in the cloud-init documentation -- there're several places that mention it, but all are cloud-specific sources, none of which I'm using (e.g. [1]). The instance meta-data chapter [2] shows it in an example but otherwise doesn't describe it.

[1]: https://cloudinit.readthedocs.io/en/latest/topics/datasources/configdrive.html?highlight=dsmode#keys-and-values

[2]: https://cloudinit.readthedocs.io/en/latest/topics/instancedata.html?highlight=dsmode#using-instance-data

Anyway, onto the questions. Will this break anyone who is tweaking their kernel command line to use a networked seed?

Context: on occasion in the past, I've set up a quick http server and tweaked the kernel command on a freshly flashed image (editing cmdline.txt on the boot partition) to point to that server as a seed. Will setting dsmode to "local" prevent someone from using a seed in this way without removing that from meta-data? (my guess on this is "yes", in which case I ought to document that in release notes)

Also, should dsmode actually be dsmode="net" in the case we're using NoCloud as the only data source?

Context: I note there's also NoCloudNet in the code for that datasource (although it doesn't appear to be documented). I would've guessed that NoCloud would set dsmode="local" implicitly, but NoCloudNet would use dsmode="net"? Is that a bug in its own right, or am I barking up the wrong tree with this? (looking at the other uses of dsmode, it seems to be a configuration option of a data-source)

Dave Jones (waveform)
Changed in cloud-init (Ubuntu):
status: Incomplete → New
Revision history for this message
Dave Jones (waveform) wrote :

Oh dear, I'm afraid I may have been barking up the wrong tree with this one...

As requested on IRC, I went to gather cloud-init collect-logs from a jammy instance and was surprised to find that, unlike last week it wasn't successfully "pingable" after the first boot. I wondered if I'd used the wrong image (release instead of .1 for instance), but no. It appears I've been seeing stale DNS records! Because my test images usually share a hostname (miss-piggy) and because I'm often just switching the card but not the Pi itself, it winds up with the same DHCP address (because dnsmasq typically attempts to use predictable addresses derived from the MAC). So it's just seeing the "miss-piggy" name from a prior reboot, with the same address...

So: my apologies for the invalid report! On the plus side, I re-tested the dsmode=local work-around with different names and confirmed it definitely sets things correct on first boot, so that change can go into kinetic (and future releases).

I *would* close this as invalid, but for the point about the documentation of set-hostname which could probably do with some clarification? And the possible confusion about NoCloud using dsmode=net? Anyway, I leave it to your best judgement!

Revision history for this message
Chad Smith (chad.smith) wrote :

Thanks Dave for the effort here.
I think we'll close this particular bug as invalid and open a documentation bug to better represent configuration implications due to shifting a datasource from dsmode:net to dsmode:local.
https://bugs.launchpad.net/cloud-init/+bug/1991677

Changed in cloud-init (Ubuntu):
status: New → Invalid
Revision history for this message
Chad Smith (chad.smith) wrote :

Specific to your question in #16
"Anyway, onto the questions. Will this break anyone who is tweaking their kernel command line to use a networked seed?"

What `dsmode:local` vs `dsmode:net` tells cloud-init whether or not the datasource environment is considered 'up and available' and whether it can run supplemental user-data parts such as:
- ShellScriptByFreqPartHandler
- BootHookPartHandler

What this translates too is anyone providing the following user-data types would see those types run slightly before network is fully configured on the system, so network egress operations on complex environments may result in timeouts
- https://cloudinit.readthedocs.io/en/latest/topics/format.html?highlight=boothook#cloud-boothook
- https://cloudinit.readthedocs.io/en/latest/topics/format.html#user-data-script

This shift is probably not an issue for most use-cases of cloud-init. A number of clouds have transitioned to init-local dsmode timeframe from init-network because of the slight speedup in boot time by emitting full network configuration prior to the system network being brought up.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.