Comment 8 for bug 1729145

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

There is a scenario where a real rootfs is located on a bcache device, however, for that we need to register a bcache device at the initrd stage which already happens now. Then we'd locate a file system on it and do pivot_root and so on.

The bcache<i> naming, I believe, is not guaranteed at this point unless we have a rule that says so.

Side-tracking to our field use-cases, we need persistence in /dev/bcache<i> names based on superblock UUIDs. So, I expect /dev/bcache/<i> names to be persisted by UUID on first discovery (which corresponds to MAAS deploy stage, not commissioning as in case of disk serial numbers).

However, we also expect bcache<i> names to match names in MAAS which may not happen in this scenario because <backing-dev-name> : bcache<i> mapping is not enforced.

Going back to https://bugs.launchpad.net/curtin/+bug/1728742, I think we can break it down into two problems:

1. bcache device numbers are not static across reboots and we need a static mapping of superblock UUID to bcache<i> for a given device. This requires CACHED_UUID to be present in uevent environment which is only possible during a successful registration where this code path is triggered. As a result of rootfs on bcache requirement, this makes sense to do at the initrd stage before we have to do pivot_root to the real rootfs.

Doing something like that when systemd is running post pivot_root and /dev devtmpfs transfer to the real rootfs doesn't sound right to me as we have this problem with double registration. In summary, I think /dev/bcache/by-uuid/ symlinks for bcache devices that exist on initial boot should be created via udev rules in initrd.

This is what this bug is about.

2. bcache device names may not match the ones in MAAS. This has implications for our use of Juju Storage functionality when we need device special files with static names without file systems or partition tables present. After commissioning in MAAS there's already metadata present about a given machine - disk serial numbers are gathered (if present, this is not guaranteed and block driver-specific AFAIK but a sane assumption to make) and device names that were assigned during ephemeral image boot are presented and stored in a database with associated serial numbers available for querying to set up dname symlinks on deployment.

In order to make <backing-dev-name> : bcache<i> mapping static we need to essentially have a mapping of disk serial numbers to bcache superblock UUIDs which are in turn mapped to bcache<i> names.

I would say that https://bugs.launchpad.net/curtin/+bug/1728742 is about p.2.

====

The rationale for p. 1 is that the init script sets up devtmpfs initially which then gets moved over to the real rootfs (init-bottom script) before pivot_root is performed. systemd then runs its mount point set up code which checks if a given entry in its hard-coded table of mount points is already a mount point and skips its setup if this is the case. So anything set up during initrd stage will stay there after systemd runs as devtmpfs is moved and reused.

https://git.launchpad.net/~usd-import-team/ubuntu/+source/systemd/tree/src/core/mount-setup.c?h=applied/ubuntu/xenial-updates#n77
  { "devtmpfs", "/dev", "devtmpfs", "mode=755", MS_NOSUID|MS_STRICTATIME,

path_is_mount_point -> fd_is_mount_point
https://git.launchpad.net/~usd-import-team/ubuntu/+source/systemd/tree/src/core/mount-setup.c?h=applied/ubuntu/xenial-updates#n161

static int mount_one(const MountPoint *p, bool relabel) {
...
        r = path_is_mount_point(p->where, AT_SYMLINK_FOLLOW);
        if (r < 0 && r != -ENOENT) {
                log_full_errno((p->mode & MNT_FATAL) ? LOG_ERR : LOG_DEBUG, r, "Failed to determine whether %s is a mount point: %m", p->where);
                return (p->mode & MNT_FATAL) ? r : 0;
        }
        if (r > 0)
                return 0;

init script:
https://git.launchpad.net/~usd-import-team/ubuntu/+source/initramfs-tools/tree/init?h=applied/ubuntu/xenial-updates
[ -d /dev ] || mkdir -m 0755 /dev
...

# Note that this only becomes /dev on the real filesystem if udev's scripts
# are used; which they will be, but it's worth pointing out
if ! mount -t devtmpfs -o nosuid,mode=0755 udev /dev; then
     echo "W: devtmpfs not available, falling back to tmpfs for /dev"
     mount -t tmpfs -o nosuid,mode=0755 udev /dev
     [ -e /dev/console ] || mknod -m 0600 /dev/console c 5 1
     [ -e /dev/null ] || mknod /dev/null c 1 3
fi
...

init-bottom:
https://git.launchpad.net/~usd-import-team/ubuntu/+source/systemd/tree/debian/extra/initramfs-tools/scripts/init-bottom/udev?h=applied/ubuntu/xenial-updates

...
# move the /dev tmpfs to the rootfs
mount -n -o move /dev ${rootmnt}/dev

# create a temporary symlink to the final /dev for other initramfs scripts
if command -v nuke >/dev/null; then
  nuke /dev
else
  rm -rf /dev
fi
ln -s ${rootmnt}/dev /dev