sshd does not start after update on non-Ubuntu kernels where fchownat() is broken

Bug #1814124 reported by Wojciech Sulewski
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
openssh (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

[Triage Notes]

This issue is caused on Ubuntu derivatives due to problematic symlink handling on those systems. See bug 1804847 for details, and comment 10 below for details and a workaround.

Proper Ubuntu systems do not appear to be affected.

[Original Description]

After processing system update by:
apt-get clean && apt-get autoclean && apt-get autoremove && apt-get update && apt-get upgrade && apt-get dist-upgrade && reboot

ssh server stops starting at system boot.

It starts after doing:
mkdir /var/run/sshd
chmod 0755 /var/run/sshd
service ssh start

It happens on fresh Ubuntu-16.04 installs on every VPS provide I have tested so far.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: openssh-server 1:7.2p2-4ubuntu2.6
Uname: Linux 2.6.32-042stab127.2 x86_64
ApportVersion: 2.20.1-0ubuntu2.18
Architecture: amd64
Date: Thu Jan 31 10:18:56 2019
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
SSHDConfig: Error: command ['/usr/sbin/sshd', '-T'] failed with exit code 255: Missing privilege separation directory: /var/run/sshd
SourcePackage: openssh
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Wojciech Sulewski (sulewski) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
I took a fresh system and it had openssh-server installed right away (as it is part of all the base images).
/var/run/sshd was at 0755 root:root already

I was wondering if the package install would kill the permission/ownership, so I did
  $ apt install --reinstall openssh-server
But things stayed fine.

Ok, then I purged the package and removed the path to then install it from scratch.
  $ apt remove --purge openssh-server
  $ rmdir /var/run/sshd
# ensured that /var/run/sshd really doesn't exist anymore
  $ apt install openssh-server

The path is back and permissions are ok.

I can't find how you got into this situation :-/
I'm puzzled:
- from what to what do you upgrade trusty -> xenial or just a package version in xenial?
- do you have any idea where the bad permissions on /var/run/sshd might come in your case?
- if you follow my second example of purge, rmdir, install does the path get created correctly on your system?

Changed in openssh (Ubuntu):
status: New → Incomplete
Revision history for this message
Wojciech Sulewski (sulewski) wrote :

I did some additional tests and reinstalling openssh-server does not break it. It only breaks after the update procedure.

To recreate the error, take a fresh OpenVZ 6 Templates of Ubuntu 16.04 64bit, and simply do:
apt-get clean && apt-get autoclean && apt-get autoremove && apt-get update && apt-get upgrade && apt-get dist-upgrade && reboot

Maybe there is some other package than openssh-server that breaks things down.

Here is some of the output of update procedure:
Setting up systemd (229-4ubuntu21.15) ...
addgroup: The group `systemd-journal' already exists as a system group. Exiting.
[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Failed to openat(/dev/simfs): Operation not permitted
Failed to validate path /var/run/screen: Too many levels of symbolic links
Failed to validate path /var/run/sshd: Too many levels of symbolic links
Failed to validate path /var/run/sudo: Too many levels of symbolic links
Failed to validate path /var/run/sudo/ts: Too many levels of symbolic links

...

Unpacking openssh-server (1:7.2p2-4ubuntu2.6) over (1:7.2p2-4ubuntu2.1) ...
Preparing to unpack .../openssh-client_1%3a7.2p2-4ubuntu2.6_amd64.deb ...

...

Setting up openssh-server (1:7.2p2-4ubuntu2.6) ...
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
Failed to validate path /var/run/sshd: Too many levels of symbolic links

Revision history for this message
Wojciech Sulewski (sulewski) wrote :

BTW the "fix" I got from here:
https://askubuntu.com/questions/739164/ssh-connection-refused

seems like an old problem.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I took a fresh Xenial (daily) as well as a Xenial of the release day and ran the commands:
$ apt-get clean && apt-get autoclean && apt-get autoremove && apt-get update && apt-get upgrade && apt-get dist-upgrade && reboot

Obviously the updated different amounts of packages, but none did break the permissions of /var/run/sshd.

I wonder if the issue is in the OpenVZ 6 template that you use as that seems to be the only difference that remains. I wonder if you'd have any chance to do that in LXD or a KVM Guest as a comparison?

I downloaded the template from [1] and didn't find anythig obvious.

But after all /var/run is actually /run and that is a tmpfs mount - so after a reboot nothing of the former run should be there. It should only contain things created since boot.
I wondered what exactly will recreate that path.

It isn't the service itself as that fails:
  $ systemctl stop sshd
  $ rm -rf /run/sshd/
  $ /usr/sbin/sshd -t
    Missing privilege separation directory: /var/run/sshd
    (The service behaves the same on start, so something else must have created the path)

After a reboot it is there and has the correct permissions.

The old sysV inint script at /etc/init.d/ssh:71 would have done that, but in systemd that should no more run. Here it is created by systemd-tmpfiles:
You should have a file like:

$ cat /usr/lib/tmpfiles.d/sshd.conf
d /var/run/sshd 0755 root root

That will make systemd to prepare the directory as it should be on every boot.

Maybe something in that regard is broken on your openVZ container or template?
Please check:
1. is /var/run a symlink to /run
2. is /run a tmpfs mount
3. is /usr/lib/tmpfiles.d/sshd.conf existing and has the content I have shown?
4. if /var/run/sshd is not correct boot run `systemd-tmpfiles --create` is it created (or permissions fixed)?

[1]: https://wiki.openvz.org/Download/template/precreated

Revision history for this message
Wojciech Sulewski (sulewski) wrote :

I run several OpenVZ VPS servers from 4 different providers and last month I lost ssh access to all of them after doing an update. I have never had such problem before and I update regularly.
I don't have access to KVM to check it.

1
lrwxrwxrwx 1 root root 4 Feb 3 15:16 /var/run -> /run

2
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)

3
$ cat /usr/lib/tmpfiles.d/sshd.conf
d /var/run/sshd 0755 root root

4
I will check it in a few days, once I get one of my nodes out of production and do a fresh install.

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Hello Wojciech, please make sure you're on a new enough version of OpenVZ's kernel, see https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1804847 comment #20 for more information.

Thanks

Revision history for this message
Wojciech Sulewski (sulewski) wrote :

I can not choose OpenVZ version or kernel. I can only pass some information to my VPS providers and I see all of them using version 6.

Revision history for this message
Peter Passchier (peter-passchier) wrote :

The entry in /usr/lib/tmpfiles.d/sshd.conf SHOULD be:
d /run/sshd 0755 root root

When it isn't, sshd cannot be started up after a reboot.

Revision history for this message
Robie Basak (racb) wrote :

> The entry in /usr/lib/tmpfiles.d/sshd.conf SHOULD be...

I don't agree, even if this happens to work for you. It's valid for it to be /var/run/sshd provided that your system is properly up-to-date (see related bug 1804847). If you're using some environment that is broken, please ask the people who develop that environment to fix the problem.

I'm reluctant to "just" proposed to change it the other way in a stable release, because other users may be regressed in other ways by that change. If your system is broken in this behaviour, it may be broken in other ways too that will manifest later.

Since there's no action planned to be taken in Ubuntu for this behaviour, I'm marking the bug status Invalid to make this clear to users.

For users with broken systems, the workaround for this specific symptom (rather than the general problem) is to override your tmpfiles.d entry in /etc/tmpfiles.d. DO NOT EDIT /usr/lib/tmpfiles.d/sshd.conf since this will be overwritten in a future package update. The right way to make local configuration changes is in /etc/tmpfiles.d/. See tmpfiles.d(5) for details. I'd appreciate if somebody could test and provide step-by-step instructions to help other users.

Changed in openssh (Ubuntu):
status: Incomplete → Invalid
description: updated
description: updated
summary: - sshd does not start after update
+ sshd does not start after update on non-Ubuntu kernels where fchownat()
+ is broken
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Same discussion and more in bug 1811580 marking as dup to have people finding this bug all get to the same main bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.