No fallback if the system update process fails at any point

Bug #1371703 reported by Matthew Paul Thomas
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Confirmed
Low
Unassigned
Ubuntu system image
Triaged
Wishlist
Unassigned

Bug Description

For someone to install a system update, all of the following need to be true:
(A) The phone needs to start up completely.
(B) Unity needs to be usable (e.g. bug 1436538).
(C) The networking stack needs to let Ubuntu check for updates.
(D) If System Settings is not in the Launcher, the Apps scope needs to not crash.
(E) System Settings needs to launch without crashing.
(F) The System Settings "Updates" screen needs to open without crashing.
(G) The system-image update system itself needs to work properly.

This is a long and brittle chain. If *any one* of these steps breaks, the phone is no longer updateable. And at worst -- if A, B, or C fails -- the phone is effectively bricked.

This is not a theoretical problem. On Ubuntu for PC, crasher bugs in update-manager often persist for a long time in the list of most common errors, because the updates that fix them can't themselves be installed until a nearby geek cracks out a terminal to use apt-get. That's not so practical on a phone.

Therefore, there should be a fallback path for installing system image updates in emergency situations. This path may not avoid all of the requirements listed above, but it should avoid as many as practical.

Tags: recovery
Revision history for this message
Barry Warsaw (barry) wrote :

I agree it's critical to think about. At the lowest level of the above stack, a shell + system-image-cli + networking should be able to update the device, but it's probably not super easy for the average user. I can imagine a small script that would use adb to shell in and run the update.

If the device's network is down, then I can also imagine that script could optionally push all the data and keyring files obtained in some other way (e.g. via a web site visited on the host). Once the device is attached to USB, this script would push the files into the right place and invoke s-i-cli over adb.

tags: added: client
Barry Warsaw (barry)
no longer affects: system-image (Ubuntu)
Changed in ubuntu-system-image:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Ondrej Kubik (ondrak) wrote :

This is very, very corner case, update to fail at point when there is no return means update failing when phone is in recovery mode and upgrader is applying update. At this point there is no possible interaction from user, appart from yanking out the battery or trying to hold power button long enough to force reboot....both user being plain stupid, and he probably deserves trip to repair shop.
Next option is upgrader failing which would mean fs is corrupted, or something equally bad, which is question how we can recover from that.

adb over USB is out of question, since we are disabling adb in recovery since that can be considered as security hole.
Also what if update fails right while updating recovery, then we can't even boot to recovery to "recover"
As we are talking about phone we can't really have two partitions and swap between them only after update successfully booted. We simply don't have another 2G laying around for this.

What we can do is make upgrader process in recovery more robust. We are having already quite robust User side upgrader app, which will reboot to recovery only after long list of conditions is met.
In recovery we can improve upgrader not to proceed unless all packages are well validated, for example I filed bug when upgrader will do format before validating image signature, making it too late to refuse image because broken signature. We have luckily good safeguard from user side to prevent this.

And as added feature to recover toasted phone with still live recovery we can consider to add adb sideload option to recovery. Sideload would be safe way to enable way to push in rescue image. Only issue is that we would need to introduce some single file package which will have all image parts( device/rootfs/custom) and ubuntu_command packed in. Pushing multiple images would be somehow clumsy.

Revision history for this message
Barry Warsaw (barry) wrote :

Agreed. Updating the tag to reflect that this is a recovery task, not a client task. There's not much more the client can do at that point.

tags: added: recovery
removed: client
description: updated
Revision history for this message
Zygmunt Krynicki (zyga) wrote :
Download full text (3.2 KiB)

Back nearly ten years ago I was working on an upgrade process for a STB running Linux on a 32MB NOR flash. We did consider the A/B option that we used earlier on models with more memory. Here, that was not an option. What we did implement instead is a A/x model. A is the working (current) image. The little x is a side-image that we flash at the factory that can only do one thing. Unattended upgrades.

If we get enough boot failures we go to the x image. That image has basic static image on screen showing the user to plug the Ethernet cable (this is STB we're talking about). The image was displayed on all possible outputs. Once the cable is plugged in the STB downloads the image, block by block, and flashes it live (after checking each checksum) to the A image. Once that is done we reset the failed boot counter and reboot (to A).

No matter what happens, you can reboot and install A.

So how do we upgrade x? Ha. We had a few options.

We can never upgrade x, surprisingly this is a good option by many standards. This is pretty reliable in practice. This means that we have lower support cost. This means that users can really always reboot and recover. The downsides? Obviously security updates here are important (though this is a STB so it's not a security critical product, at least, not the way we think of security *g*). The other downside is that the x image must be extremely reliable, with defensive coding, reviews and -- the best option -- proven track record of working on other projects. Some customers used this option.

We can choose to update x after booting A successfully. This is is also tricky. What if the upgrade fails? Assuming that A is still working it can try again, and again, and again. Even if it fails you still have a working A image (and the only reason it can fail is if the power cable gets yanked mid-way or if the user has worn-out the flash memory. This is pretty much a dead-end scenario anyway. Since this was a networked STB we could always try to update x from A. If you had little storage available, this is a pretty good option. Also, the user doesn't need to see anything as the x update can be done without any UI. From the users' POV the product works.

Lastly, for extra paranoid safety you can do A/x1/x2. Then you can really do recover anything. Depending on available non-volatile memory size, this option is the safest bet. I don't think we ever used this in production though.

So back to the phone. We can do A/x here IMHO, iff the recovery partition had a special, unattended recovery process. This is harder to do on a mobile device. Perhaps we could have an image that shows "plug the phone". On Ubuntu we could have appropriate udev rules that would auto-recover bricked Ubuntu phones. We could also have a QR code with a link to a page that has Windows / OS X software (once we get some) or explanation on how to do this from an Ubuntu machine. A minimal kernel, minimal userspace (no unity, no anything, just mir to display a single static image, this can even be a part of the bootloader if we look hard enough). The image doesn't need any i18n in it. On the desktop software side we'd have to have a way to do recovery flas...

Read more...

Revision history for this message
Quentin Quaadgras (quentin-d) wrote :

Just want to add an example of when an update process is interrupted (Bug 1454457)
In this case (G) failed on the restart of the phone, not being able to reinstall the update that failed.

Changed in canonical-devices-system-image:
assignee: nobody → John McAleely (john.mcaleely)
importance: Undecided → Low
status: New → Confirmed
Changed in canonical-devices-system-image:
assignee: John McAleely (john.mcaleely) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.