cloud-init fails to detect iSCSI root on focal Oracle instances

Bug #1872813 reported by Dan Watkins
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Invalid
Undecided
Dan Watkins
open-iscsi (Ubuntu)
Fix Released
Undecided
Dan Watkins
Bionic
Fix Released
High
Jorge Merlino
Focal
Fix Released
Undecided
Dan Watkins

Bug Description

[Impact]

When creating a bare metal instance on Oracle Cloud (which are backed by an iscsi disk), the IP address is configured on an interface (enp45s0f0) on boot, but cloud-init is generating a /etc/netplan/50-cloud-init.yaml with an entry to configure enp12s0f0 using dhcp. As a result, enp12s0f0 will send a DHCPREQUEST and wait for a reply until it times out, delaying the boot process, as there's no dhcp server serving this interface.
This is caused by a missing /run/initramfs/open-iscsi.interface that should point to the enp45s0f0 interface

[Fix]

There is a script from the open-iscsi package that checks if there are no iscsi disks present and if there are no disks removes the /run/initramfs/open-iscsi.interface file that stores the interface where the iscsi disk is present.

This script originally runs along the local-top initrd scripts but uses the /dev/disk/by-path/ path to find if there are iscsi discs present. This path does not yet exists when the local-top scripts are run so the file is always removed.

This was fixed in Focal by moving the script to run along the local-bottom scripts. When these scripts run the /dev/disk/by-path/ path exists.

[Test Plan]

This can be reproduced by instancing any bare metal instance on Oracle Cloud (all are backed by an iscsi disk) and checking if the /run/initramfs/open-iscsi.interface file is present.

[Where problems could occur]

There should be no problems as the script runs anyway but later into the boot process.

If the script fails to run it could leave the open-iscsi.interface file present with no iscsi drives but that should cause no issues besides delaying the boot process.

[Original description]

Currently focal images on Oracle are failing to get data from the Oracle DS with this traceback:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 772, in find_source
    if s.update_metadata([EventType.BOOT_NEW_INSTANCE]):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 661, in update_metadata
    result = self.get_data()
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 279, in get_data
    return_value = self._get_data()
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 195, in _get_data
    with dhcp.EphemeralDHCPv4(net.find_fallback_nic()):
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 57, in __enter__
    return self.obtain_lease()
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 109, in obtain_lease
    ephipv4.__enter__()
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 1019, in __enter__
    self._bringup_static_routes()
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 1071, in _bringup_static_routes
    util.subp(
  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2084, in subp
    raise ProcessExecutionError(stdout=out, stderr=err,
cloudinit.util.ProcessExecutionError: Unexpected error while running command.
Command: ['ip', '-4', 'route', 'add', '0.0.0.0/0', 'via', '10.0.0.1', 'dev', 'ens3']
Exit code: 2
Reason: -
Stdout:
Stderr: RTNETLINK answers: File exists

In https://github.com/canonical/cloud-init/blob/46cf23c28812d3e3ba0c570defd9a05628af5556/cloudinit/sources/DataSourceOracle.py#L194-L198, we can see that this path is only taken if _is_iscsi_root returns False.

Related branches

Revision history for this message
Dan Watkins (oddbloke) wrote :
Changed in cloud-init:
assignee: nobody → Dan Watkins (daniel-thewatkins)
status: New → In Progress
Revision history for this message
Dan Watkins (oddbloke) wrote :

The following conditions must currently be met for cloud-init to detect iSCSI root (symbols are in cloudinit.net.cmdline unless otherwise specified):

* _get_klibc_net_cfg_files() must return something; it's implementation is: return glob.glob('/run/net-*.conf') + glob.glob('/run/net6-*.conf')

AND one of

* 'ip=' in self._cmdline or 'ip6=' in self._cmdline (where self._cmdline is the kernel cmdline)
* /run/initramfs/open-iscsi.interface exists

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

... and not use casper

cause i uploaded changes in casper that purge /run/net-* files

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Also, sometime ago we added this with rcj:

# Remove the interface file if no disks are present
if [ -f /run/initramfs/open-iscsi.interface ] ; then
        found=0
        for disk in /dev/disk/by-path/*-iscsi-*; do
                if ! "$(readlink -f "$disk")" ; then
                        continue
                fi
                found=1
                break;
        done
        if [ $found = 0 ] ; then
                rm /run/initramfs/open-iscsi.interface
        fi
fi

I.e. if no disks found, open-iscsci.interface file might be removed. Are iscsi disks still called /dev/disk/by-path/*-iscsi-* ? And are seen in the initramfs?

Revision history for this message
Dan Watkins (oddbloke) wrote :

That shell code looks buggy to me; `! "$(readlink -f "$disk")"` should probably be `! readlink -f "$disk".

If I put the body of the if in foo.sh (and s/rm/echo rm/), I get:

$ sh foo.sh
foo.sh: 4: /dev/sda: Permission denied
foo.sh: 4: /dev/sda1: Permission denied
foo.sh: 4: /dev/sda14: Permission denied
foo.sh: 4: /dev/sda15: Permission denied
rm /run/initramfs/open-iscsi.interface

If I make that modification:

$ sh foo.sh
/dev/sda

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

that code is in open-iscsi package

Pleas provide debdiff, and I can sponsor it for you.

Dan Watkins (oddbloke)
Changed in cloud-init:
status: In Progress → Invalid
Revision history for this message
Dan Watkins (oddbloke) wrote :

I attempted a fix at https://paste.ubuntu.com/p/hsRgrpr5PT/, but it appears that /dev/disk/by-path isn't available when the top script runs at all, so the check never finds any disks. (The existing code that references those paths appears to be a special case where the script is running a second time, so it may well be correct still.)

My next attempt will be https://paste.ubuntu.com/p/cT5tDK2rdH/ which moves the snippet in question from local-top to local-bottom.

Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, one remaining bug in that was resolved in https://paste.ubuntu.com/p/sgqJ5dFg5D/. (If there are no matches for the pattern in the for loop, we still go through it once with the static string as $disk, which readlink -f doesn't error on.)

Since writing the above, I've realised that we probably just want to use `readlink -e`, which I'm working on now.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Oh, except we don't have -e on the readlink in busybox, so I think https://paste.ubuntu.com/p/sgqJ5dFg5D/ is the winner.

Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, a fixed version of this is now in the focal Unapproved queue. I have manually built it and tested it on each of a bare metal and virtual machine in Oracle. I have confirmed that it has the expected specific behaviour in each (interface file present and absent, respectively), and also at a higher level, that the Oracle data source is used in each case.

(Thanks to xnox for the sponsorship!)

Changed in open-iscsi (Ubuntu):
assignee: nobody → Dan Watkins (daniel-thewatkins)
status: New → In Progress
Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Hello Dan, or anyone else affected,

Accepted open-iscsi into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/open-iscsi/2.0.874-7.1ubuntu6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in open-iscsi (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package open-iscsi - 2.0.874-7.1ubuntu6

---------------
open-iscsi (2.0.874-7.1ubuntu6) focal; urgency=medium

  * d/extra/initramfs.local-{top,bottom}: move removal of open-iscsi.interface
    file from local-top to local-bottom, and fix shell quoting issue that
    would result in /run/initramfs/open-iscsi.interface always being removed
    (LP: #1872813)

 -- Daniel Watkins <email address hidden> Tue, 14 Apr 2020 16:54:37 -0400

Changed in open-iscsi (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
John Chittum (jchittum) wrote :

Confirmed fix. The main symptom was inability to SSH into a running instance due to cloud-init not completing. Main test was to instantiate an instance and SSH in. Test procedures:

1. checked daily CPC Oracle build to ensure build passed
2. Checked image manifest to ensure open-iscsi 2.0.874-7.1ubuntu6 was installed
3. Uploaded image to Oracle US-East Bucket
4. Imported image as both a PV and Native boot images (Native boot images use iscsi)
5. Created instances of both PV and Native images
6. checked PV and Native images booted (Success)
7. checked ability to SSH into both images (Success)
8. Ran basic smoke checks to system errors, internet connectivity, etc (systemctl, apt, and snap commands)(Success)

tags: added: id-5bbe5d6e338b8e69a2c66363
Changed in open-iscsi (Ubuntu Bionic):
assignee: nobody → Jorge Merlino (jorge-merlino)
Revision history for this message
Jorge Merlino (jorge-merlino) wrote :

SRU for Bionic

tags: added: sts
Changed in open-iscsi (Ubuntu Bionic):
status: New → In Progress
Revision history for this message
Jorge Merlino (jorge-merlino) wrote :

I tested the patch myself on an Oracle Cloud machine that presented this issue.

tags: added: sts-sponsor-halves
description: updated
Revision history for this message
Heitor Alves de Siqueira (halves) wrote (last edit ):

Sponsored for Bionic. Thanks for the contribution, @jorge-merlino!

Changed in open-iscsi (Ubuntu Bionic):
importance: Undecided → High
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Dan, or anyone else affected,

Accepted open-iscsi into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/open-iscsi/2.0.874-5ubuntu2.11 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in open-iscsi (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
Jorge Merlino (jorge-merlino) wrote :

Tested version 2.0.874-5ubuntu2.11 from proposed and worked fine.

https://pastebin.ubuntu.com/p/n4F5rvtMrs/

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for open-iscsi has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package open-iscsi - 2.0.874-5ubuntu2.11

---------------
open-iscsi (2.0.874-5ubuntu2.11) bionic; urgency=medium

  * d/extra/initramfs.local-{top,bottom}: move removal of open-iscsi.interface
    file from local-top to local-bottom, and fix shell quoting issue that
    would result in /run/initramfs/open-iscsi.interface always being removed
    (LP: #1872813)

 -- Jorge Merlino <email address hidden> Wed, 06 Apr 2022 19:19:56 +0000

Changed in open-iscsi (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.