Comment 27 for bug 1988819

Revision history for this message
Xiao Wan (xwcal) wrote :

To add a data point and save others from preventable headaches, I encountered a related bug and reported it here:
https://bugs.launchpad.net/ubuntu/+source/apt/+bug/2017399

If anyone cares, here are my two cents regarding phased updates:
https://github.com/xwcal/ubuntu-apt-bug#a-short-critique-of-phased-updates

Or if lack of markdown formatting is not an issue, here is goes:

# A Short Critique of Phased Updates

Some thoughts after I stumbled on phased updates during a recent migration from Bionic to Jammy:

## Randomness and Repeatability

Per [Debian repository specs](https://wiki.debian.org/DebianRepository/Format#Phased-Update-Percentage), randomization is achieved by the following means:

```
To determine whether an update is applicable to a given machine, a client shall seed a random number generator (APT uses std::minstd_rand) with the string:

 sourcePackage-version-machineID

where sourcePackage is the name of the source package, version the version, and machineID the contents of /etc/machine-id.

It shall then extract an integer from a uniform integer distribution between 0 and 100.
```

To the uninitiated: this is not how a random number generator is meant to be used. If you want a repeatable sequence of random numbers, you seed the RNG once and then draw the desired number of samples. You don't reseed the RNG for each new sample, since if your seeds are already random, then the RNG step is pointless, whereas if your seeds aren't random, say, in an extreme case, if you always seed with the same number, your "random" number is guaranteed to be the same each time.

To achieve repeatable randomness with respect to both the package and machine, there is an obviously superior choice in terms of both randomness and portability: **just use a cryptographic hash like sha256**.

This begs the question of how repeatable the design really is.
To quote [https://wiki.debian.org/MachineId](https://wiki.debian.org/MachineId):

```
what is the machine id actually used for?

    a comprehensive list is probably not possible without grepping the code
```

That's to say, although for repeatable experiments it's foreseeably necessary to mess with the machine ID or share it publicly,
the consequences of such tinkering or disclosure are hard to assess, since nobody knows for sure how and where the machine ID is and will be used.

This leads to an easily conceivable alternative: since apt only uses the machine ID for the purpose of repeatable randomization, how about designating another user configurable value stored in, say `/etc/apt/phased-update-random-seed`, only for this particular purpose and by default initialized to the machine ID?

## User Choice

From [this response](https://bugs.launchpad.net/ubuntu/+source/apt/+bug/2017399/comments/1) to my bug report (one question I forgot to ask is why phase security updates at all) and a comment on [askubuntu](https://askubuntu.com/questions/1431940/what-are-phased-updates-and-why-does-ubuntu-use-them) that doesn't cite any source, it does appear that consideration has been given regarding the need to treat different packages (at least different classes of packages) differently when it comes to phasing.

It's conceivable that a user may have preferences over how each particular package is phased. For example, if a GUI component has a troubled history with my Nvidia graphics, I may prefer to be highly conservative and wait until the last minute, but this shouldn't prevent me from being adventurous when it comes to other phased packages.

Unfortunately, it seems that a user currently has to decide between all or nothing, ie. between `APT::Get::Always-Include-Phased-Updates` or `APT::Get::Never-Include-Phased-Updates`.

How about allowing per package user choice between Always and Never?

Overall, I think for a such a prominent change to the cornerstone of probably half of today's Linux infrastructure, much remains to be desired.