sshd does not start on newly installed desktop system

Bug #1554266 reported by Max Brustkern
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
UTAH
Fix Released
Undecided
Unassigned

Bug Description

When I preseed a desktop install using utah for daily iso testing, the ssh service is not running when it starts up. journalctl -u ssh shows no entries. If I manually start the ssh service, it seems to work. If I reboot, sshd is running. On the first boot, however, it is not. This problem appears to have started on the March 5 image.

Revision history for this message
Colin Watson (cjwatson) wrote :

IRC speculation is that this might be due to https://launchpad.net/ubuntu/+source/init-system-helpers/1.29ubuntu1, although that isn't enough to determine whether it's a regression in i-s-h or something that openssh is doing wrong that was just exposed by that upload.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Can I pin i-s-h during install or is there some other way I can help verify this?

Revision history for this message
Martin Pitt (pitti) wrote :

I tried to purge and reinstall openssh-server on today's cloud image (with i-s-h 1.29ubuntu1) and ssh.service does start up right away as expected. So this naïve reproducer doesn't work. Max, how does openssh-server get installed in your case? Sorry for my ignorance, I don't know much about preseeding -- is this just translated into some apt-get install calls? Do they happen in a chroot with a policy-rc.d or similar?

Revision history for this message
Max Brustkern (nuclearbob) wrote : Re: [Bug 1554266] Re: sshd does not start on newly installed desktop system

I re-ran this job:
https://platform-qa-jenkins.ubuntu.com/view/smoke-default/job/ubuntu-xenial-desktop-amd64-smoke-default/
It installs from the daily iso using utah. It uses a preseed here:
http://bazaar.launchpad.net/~ubuntu-test-case-dev/ubuntu-test-cases/desktop/view/head:/preseeds/default.cfg
that installs openssh-server via pkgsel.

To verify what was going on, I connected virt-manager to the jenkins
executor node (venonat) and opened a console on the VM while it was waiting
to time out. If you need assistance setting up this recreate method, I can
provide that.

On Wed, Mar 9, 2016 at 11:24 AM, Martin Pitt <email address hidden> wrote:

> I tried to purge and reinstall openssh-server on today's cloud image
> (with i-s-h 1.29ubuntu1) and ssh.service does start up right away as
> expected. So this naïve reproducer doesn't work. Max, how does openssh-
> server get installed in your case? Sorry for my ignorance, I don't know
> much about preseeding -- is this just translated into some apt-get
> install calls? Do they happen in a chroot with a policy-rc.d or similar?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1554266
>
> Title:
> sshd does not start on newly installed desktop system
>
> Status in openssh package in Ubuntu:
> New
>
> Bug description:
> When I preseed a desktop install using utah for daily iso testing, the
> ssh service is not running when it starts up. journalctl -u ssh shows
> no entries. If I manually start the ssh service, it seems to work. If
> I reboot, sshd is running. On the first boot, however, it is not. This
> problem appears to have started on the March 5 image.
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/1554266/+subscriptions
>

Revision history for this message
Max Brustkern (nuclearbob) wrote :

FWIW, this doesn't happen on current server installs, just desktop.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

This is blocking current image promotions. Is there something I can do to provide more data for getting it fixed?

Revision history for this message
Max Brustkern (nuclearbob) wrote :

I've confirmed that openssh-server is installed on the system, and can be started manually after the first reboot. I previously saw that it started automatically after the second reboot, but I haven't confirmed that this week.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

If we enable oem config in the preseed, this problem doesn't occur.

Revision history for this message
Martin Pitt (pitti) wrote :

Are there any steps how this can be reproduced? I already tried on a cloud image and on a desktop, Mathieu tried with pre-seeding, etc. A reproduction recipe based on downloading a current image would be most helpful here to understand what's going on here.

Changed in openssh (Ubuntu):
status: New → Incomplete
Revision history for this message
Max Brustkern (nuclearbob) wrote :

The actual command running on venonat under jenkins is this:
sudo -u utah -i UTAH_CONFIG_DIR=/var/lib/jenkins-utah-iso/workspace/ubuntu-xenial-desktop-amd64-smoke-default/config run_utah_tests.py -i /var/cache/utah/iso/xenial-desktop-amd64.iso -p lp:ubuntu-test-cases/desktop/preseeds/default.cfg lp:ubuntu-test-cases/desktop/runlists/default.run -f /var/log/installer -f /var/log/installer -x /etc/utah/bridged-network-vm.xml --outputpreseed

But most of that is superfluous and specific to the job.
sudo -u utah -i run_utah_tests.py -i /var/cache/utah/iso/xenial-desktop-amd64.iso -p lp:ubuntu-test-cases/desktop/preseeds/default.cfg lp:ubuntu-test-cases/desktop/runlists/default.run -x /etc/utah/bridged-network-vm.xml
should be sufficient. I'm working on a manual recreate now. The thing that remains interesting to me is that the problem doesn't occur on server or if oem config is enabled.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

My attempts at manual recreation so far have failed. I'm going to see if I can narrow down what potential elements of a preseed might be causing this.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

If I use the default server preseed:
lp:ubuntu-test-cases/server/preseeds/default.preseed
it starts ssh on server images, but not on desktop images.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

I've created some VMs to recreate this on venonat.ubuntu-ci (accessible via the vpn.) The user is upgrade-test, and I've added ssh keys for pitti and cyphermox. The problem only occurs on the first boot, so once we start a VM, we'll need to clone another if we want to recreate again. As such, please do not start max-ssh-recreate-base. max-ssh-recreate-base-clone should work, and we can clone the base VM again if needed.

Revision history for this message
Martin Pitt (pitti) wrote :

Note: When running this kind of VM images, always run qemu with -snapshot to not modify the image. This will create an internal transient overlay for the given image and leave it untouched.

For me, ssh venonat.ubuntu-ci fails as ubuntu@, pitti@, and root@, which user name did you use?

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Sorry, upgrade-tester is the user, I mistyped it before.

Revision history for this message
Martin Pitt (pitti) wrote :

Ah, I can log in with upgrade-tester@. How do you launch these VMs? I tried

$ qemu-system-x86_64 -enable-kvm -m 2048 -drive file=./max-ssh-recreate-base-clone.qcow2,if=virtio -snapshot -nographic

but I don't see a thing because apparently these VMs don't have console=ttyS0. I suppose everything goes to tty1 which is thus invisible.

Revision history for this message
Martin Pitt (pitti) wrote :

I now ssh'ed in with -X and dropped -nographic. Painfully slow, and the screen is weirdly condensed, but this works in principle. What is the password for the UTAH user? I tried "utah", "UTAH", "ubuntu", and empty, but no luck..

Revision history for this message
Martin Pitt (pitti) wrote :

Sorry, but through ssh -X it is just impossible to type (a lot of characters get repeated 10 times, and others dropped, even if I just type one char every 5 seconds), and the mouse is also useless. How do you interact with these VMs?

Revision history for this message
Martin Pitt (pitti) wrote :

So, originally I assumed this meant that ssh wasn't running right after "apt-get install openssh-server". But you are saying that it does not work after the first *boot*? Otherwise having these VMs and booting them would not be useful for debugging anything.

So the expectation is that booting max-ssh-recreate-base-clone.qcow2 will not start the ssh server, but a reboot will? What's the output of "sudo systemctl status ssh" after the first boot? Can you please also capture "sudo journalctl -b" and attach the output?

Thanks!

(BTW, I used the guest session in my previous comment, and tried to get some data as user, but at least under unity this is utterly impossible)

Revision history for this message
Martin Pitt (pitti) wrote :

Max grabbed the journal (attached), which explains what's going on:

openssh-server is not already installed, the first boot installs it. But this happens in /etc/rc.local, which we run fairly early (right after the network is up). In the journal we see roughly this order:

784:Apr 08 13:52:27 utah-12201-xenial-amd64 systemd[1]: Reached target Network.
899:Apr 08 13:52:35 utah-12201-xenial-amd64 utah[1046]: Installing openssh-server...
900:Apr 08 13:52:35 utah-12201-xenial-amd64 sudo[1047]: root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/apt-get install -y openssh-server --force-yes
940:Apr 08 13:53:47 utah-12201-xenial-amd64 systemd[1]: Reached target Login Prompts.
941:Apr 08 13:53:47 utah-12201-xenial-amd64 systemd[1]: Reached target Multi-User System.
942:Apr 08 13:53:47 utah-12201-xenial-amd64 systemd[1]: Reached target Graphical Interface.

I. e. installing the openssh-server package is happening while the boot transaction is still going on; installing new units during that time won't cause the dependency tree of graphical.target to be recomputed, so it's not taken into account for starting.

A cleaner way would be to install openssh-server during OS install, not as part of first boot. Another option is to wait until the VM is booted, and then install it (if you have some way to access the VM in another way than ssh, such as the serial console).

A more hackish, but perhaps simpler way would be to change rc.local to wait until the system is booted with something like

  while true; do
      s=`systemctl is-system-running` || true
      [ "$s" != running -a "$s" != degraded ] || break
      sleep 1
  done

and only then install openssh-server.

(Checking for "degraded" is more robust in case the default installtion has some broken units which fail to start. This happens from time to time, like bug 1567780)

Changed in openssh (Ubuntu):
status: Incomplete → Triaged
affects: openssh (Ubuntu) → utah
Revision history for this message
Martin Pitt (pitti) wrote :

The current bug is a race condition which is highly timing dependent: As long as downloading the ssh debs and unpacking them takes longer than the boot to finish, then all is well -- but as soon as the network or the disk get faster, or booting takes longer, then ssh gets configured while the boot is still going on, and it won't start. Interesting time bomb :-)

Revision history for this message
Max Brustkern (nuclearbob) wrote :

I'm working on getting these changes into utah. Thanks for all the analysis and suggestions!

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Tests are now passing in production. Thanks all!

Changed in utah:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.