Comment 4 for bug 2016908

Revision history for this message
Francis Ginther (fginther) wrote : Re: Unable to deploy hosts with lunar images after 20230319 - fails to connect and download squashfs

The root of the problem is that the kernel's networking modules are not being loaded, but no idea why this is yet. When comparing logs from an earlier maas deployment that loads the squashfs, I see this:

Starting systemd-udevd version 252.5-2ubuntu2
[ 24.561035] IPMI message handler: version 39.2
[ 24.579284] ipmi device interface
[ 24.589680] xhci_hcd 0004:03:00.0: Adding to iommu group 37
[ 24.594182] ACPI: bus type drm_connector registered
[ 24.596816] xhci_hcd 0004:03:00.0: failed to load firmware renesas_usb_fw.mem, fallback to ROM
[ 24.601479] ipmi_ssif: IPMI SSIF Interface driver
[ 24.608865] xhci_hcd 0004:03:00.0: xHCI Host Controller
[ 24.617397] ipmi_si: IPMI System Interface driver
[ 24.618771] xhci_hcd 0004:03:00.0: new USB bus registered, assigned bus number 1
[ 24.626843] ipmi_si: Unable to find any System Interface(s)
[ 24.626990] ipmi_ssif i2c-AMPC0004:00: ipmi_ssif: Trying ACPI-specified SSIF interface at i2c address 0x10, adapter Synopsys DesignWare I2C adapter, slave address 0x20
[ 24.630860] xhci_hcd 0004:03:00.0: Zeroing 64bit base registers, expecting fault
[ 24.768420] ipmi_ssif i2c-AMPC0004:00: IPMI message handler: Found new BMC (man_id: 0x00cd3a, prod_id: 0x0082, dev_id: 0x20)
[ 24.800956] xhci_hcd 0004:03:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x0000001100000410
[ 24.814231] xhci_hcd 0004:03:00.0: xHCI Host Controller
[ 24.819491] xhci_hcd 0004:03:00.0: new USB bus registered, assigned bus number 2
[ 24.826888] xhci_hcd 0004:03:00.0: Host supports USB 3.0 SuperSpeed
...
[ 24.960323] mlx5_core 0000:01:00.0: Adding to iommu group 39
[ 24.967051] mlx5_core 0000:01:00.0: firmware version: 14.26.1040
[ 24.973085] mlx5_core 0000:01:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
...
+ many many more lines
Begin: Loading essential drivers ... [ 28.602841] raid6: neonx8 gen() 15741 MB/s

Within that section of logs is where I see the network device being recognized (mlx5_core in this case). From the failing case, we don't see any of this:

[ 21.660837] Run /init as init process
Loading, please wait...
Starting systemd-udevd version 252.5-2ubuntu3
Begin: Loading essential drivers ... [ 25.605516] raid6: neonx8 gen() 13160 MB/s
[ 25.677515] raid6: neonx4 gen() 11952 MB/s

I did notice that there is a newer version of udev 252.5-2ubuntu3 over 252.5-2ubuntu2, so I'm suspicious of the udev or the udev rules being a possible source of the issue. I ran an experiment of copying all of the files in the udev package from an older lunar install and repacked them into the boot-initrd. This didn't help, although I systemd-udevd reported the version from the system I copied from (further indicating that the boot-initrd is being loaded and used).

I'm going to continue experimenting with this. My first attempt only copied the udev rules present in the udev package. There are other rules installed by other packages that I didn't copy yet.