failing console access on s390x, ppc64el

Bug #1630909 reported by Christian Ehrhardt 
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
autopkgtest (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi,
as discussed on IRC a bit I wanted to create an s390x ADT environment but failed to do so.
It seems the extra consoles are not created correctly.

This dumps that state so far to be debugged and fixed later on - or for whoever runs into the same to find it via searching.

Creating a new image works as long as one specifies the alternative ports mirror

sudo ~/autopkgtest-4.1/tools/autopkgtest-buildvm-ubuntu-cloud -v --arch=s390x --mirror=http://ports.ubuntu.com/ubuntu-ports -r yakkety -s 10G

But then running a test fails like this:
(yes I know old syntax)
sudo ~/autopkgtest-4.1/runner/adt-run -ddd --shell-fail --apt-upgrade --no-built-binaries --source neutron_9.0.0~rc3-0ubuntu1.dsc --- adt-virt-qemu --cpus 4 --ram-size=2048 ~/autopkgtest-yakkety-s390x.img
adt-run: WARNING: "adt-run" is deprecated, please use "autopkgtest" (see manpage)
adt-run: DBG: Parsed options: Namespace(apt_pocket=[], auto_control=True, build_parallel=None, copy=[], env=[], gainroot=None, logfile=None, output_dir=None, set_lang=None, setup_commands=['(apt-get update || (sleep 15; apt-get update) || (sleep 60; apt-get update) || false) && $(which eatmydata || true) apt-get dist-upgrade -y -o Dpkg::Options::="--force-confnew"'], shell=False, shell_fail=True, summary=None, timeout_build=None, timeout_copy=None, timeout_factor=1.0, timeout_install=None, timeout_short=None, timeout_test=None, user=None, verbosity=2)
adt-run: DBG: Remaining arguments: ['--no-built-binaries', '--source', 'neutron_9.0.0~rc3-0ubuntu1.dsc']
adt-run: DBG: Interpreted actions: ['--no-built-binaries', '--source', 'neutron_9.0.0~rc3-0ubuntu1.dsc']
adt-run: DBG: Virt runner arguments: ['adt-virt-qemu', '--cpus', '4', '--ram-size=2048', '/home/ubuntu/autopkgtest-yakkety-s390x.img']
adt-run: DBG: testbed init
adt-run [04:29:22]: version @version@
adt-run [04:29:22]: host s1lp5; command line: /home/ubuntu/autopkgtest-4.1/runner/adt-run -ddd --shell-fail --apt-upgrade --no-built-binaries --source 'neutron_9.0.0~rc3-0ubuntu1.dsc' --- adt-virt-qemu --cpus 4 --ram-size=2048 /home/ubuntu/autopkgtest-yakkety-s390x.img
adt-run: DBG: got reply from testbed: ok
adt-run: DBG: testbed open, scratch=None
adt-run: DBG: sending command to testbed: open
adt-run: DBG: got reply from testbed: Using SCSI scheme.
adt-run: DBG: TestbedFailure sent `open', got `Using SCSI scheme.', expected `ok...'
adt-run: DBG: testbed stop
adt-run: DBG: testbed close, scratch=None
adt-run: DBG: sending command to testbed: quit
qemu-system-s390x: terminating on signal 15 from pid 53355
<VirtSubproc>: failure: timed out waiting for "login prompt on ttyS0"

After discussing on IRC I understood that it checks for a loginn on ttyS0 or as fallback a root shell on ttyS1.
But it seems none of those get spawned.

I was able to boot the image just fine with:
sudo qemu-system-s390x -m 2048 -smp 4 -nographic -net nic,model=virtio -net user,hostfwd=tcp::10022-:22 -drive file=/home/ubuntu/autopkgtest-yakkety-s390x.img,cache=unsafe,if=virtio,index=0 -enable-kvm

But one has to note that in this case the default console is automatically connected to stdio (due to -nographics) and the mode it a sclp console.

I can add a virtio-serial console and get a valid non sclp console by appending:
-chardev socket,path=/tmp/port0,server,nowait,id=port0-char -device virtio-serial -device virtserialport,id=port1,name=org.fedoraproject.port.0,chardev=port0-char

I can add more serial consoles on unix sockets as autopkgttest does, but nothing appears on these sockets (only monitor gets some content).
-serial unix:/tmp/testadt/ttyS0,server,nowait -serial unix:/tmp/testadt/ttyS1,server,nowait -monitor unix:/tmp/testadt/monitor,server,nowait

In the guest no device appears for any of those, so no tty can spawn on them:
ll /dev/tty*
crw-rw-rw- 1 root tty 5, 0 Oct 6 00:18 /dev/tty
crw------- 1 ubuntu tty 4, 65 Oct 6 05:16 /dev/ttysclp0

I feel I should know how to configure it further to get valid consoles, but I'd read some docs first so documenting the current state for now.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The root cause seems to be that I miss something to properly add the console to the guest.
As listed above it has no /dev/ttyS0 or any such.

Usually adt would set console=ttyS0 for the kernel parmline which it doesn't on s390x yet.
But obviously if doing so it only ends in failing to spawn anything on there as the tty device doesn't exist.

[...]
[*** ] A start job is running for dev-ttyS0.device (6s / 1min 30s)
[ TIME ] Timed out waiting for device dev-ttyS0.device.

The main console still survives (falls back?) to sclp0
ubuntu@autopkgtest:~$ tty
/dev/ttysclp0

Whatever I miss I expect it to start with getting more than just:
ll /dev/tty*
crw-rw-rw- 1 root tty 5, 0 Oct 6 00:18 /dev/tty
crw------- 1 ubuntu tty 4, 65 Oct 6 05:16 /dev/ttysclp0

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

After discussing with my Friends working on KVM on s390x I found the following to get me "extra consoles onto unix sockets". The default as mentioned is a sclp console and that is required for early messages and also the recommended and tested one.

For further consoles they recommend to use virtio-serial/virtconsoles. Based on the feedback I constructed these parms for my adt qemu:

-device virtio-serial -chardev socket,path=/tmp/testadt/ttyhvc,server,nowait,id=ttyhvc -device virtconsole,chardev=ttyhvc,name=org.fedoraproject.console.ttyhvc

That gives me a login prompt on /tmp/testadt/ttyhvc just as adt needs it.

I first thought it might need a modification like "console=hvc0" as kernel commandline.
But the hvc consoles seem to be auto-initialized if existing - so no change inside the guest needed.

So the current default:
-serial unix:/tmp/autopkgtest-virt-qemu.84_ut07n/ttyS0,server,nowait -serial unix:/tmp/autopkgtest-virt-qemu.84_ut07n/ttyS1,server,nowait
should for s390x be converted to use the one outlined above.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I hacked up this:
--- virt/autopkgtest-virt-qemu.orig 2016-10-06 07:13:55.409049291 -0400
+++ virt/autopkgtest-virt-qemu 2016-10-06 07:21:49.639298404 -0400
@@ -525,6 +525,8 @@
         nic_opt = ''

     # start QEMU
+ #'-serial', 'unix:%s/ttyS0,server,nowait' % workdir,
+ #'-serial', 'unix:%s/ttyS1,server,nowait' % workdir,
     argv = [args.qemu_command,
             '-m', str(args.ram_size),
             '-smp', str(args.cpus),
@@ -532,8 +534,11 @@
             '-net', 'nic,model=virtio',
             '-net', 'user' + nic_opt,
             '-monitor', 'unix:%s/monitor,server,nowait' % workdir,
- '-serial', 'unix:%s/ttyS0,server,nowait' % workdir,
- '-serial', 'unix:%s/ttyS1,server,nowait' % workdir,
+ '-device', 'virtio-serial',
+ '-chardev', 'socket,path=%s/ttyS0,server,nowait,id=ttyS0' % workdir,
+ '-device', 'virtconsole,chardev=ttyS0,name=org.fedoraproject.console.ttyS0',
+ '-chardev', 'socket,path=%s/ttyS1,server,nowait,id=ttyS1' % workdir,
+ '-device', 'virtconsole,chardev=ttyS1,name=org.fedoraproject.console.ttyS1',
             '-virtfs',
             'local,id=autopkgtest,path=%s,security_model=none,mount_tag=autopkgtest' % shareddir,
             '-drive', 'file=%s,cache=unsafe,if=virtio,index=0' % overlay]

And ran it with:
sudo ~/autopkgtest-4.1/runner/adt-run -ddd --shell-fail --apt-upgrade --no-built-binaries --source neutron_9.0.0~rc3-0ubuntu1.dsc --- adt-virt-qemu --cpus 4 --ram-size=2048 --user ubuntu --password ubuntu ~/autopkgtest-yakkety-s390x.img

But that runs into various issues not s390x related - I opened extra bug 1630963 for those as far as I could debug them easily.

Leaving this for you consideration and proper inclusion once you find the time (and s390x machine).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I found that the setup-testbed failed/forgot to place the console enabling files.
They needed to be adapted anyway - I now have those which gives me a root console:

sudo nc -U /tmp/testadt/ttyhvc1
# id
id
uid=0(root) gid=0(root) groups=0(root)

ubuntu@autopkgtest:~$ cat /etc/init.d/autopkgtest
#!/bin/sh
### BEGIN INIT INFO
# Provides: autopkgtest
# Required-Start: \$all
# Required-Stop:
# Default-Start: 2 3 4 5
# Default-Stop:
### END INIT INFO

if [ "\$1" = start ]; then
    echo "Starting root shell on hvc1 for autopkgtest"
    (setsid sh </dev/hvc1 >/dev/hvc1 2>&1) &
fi
ubuntu@autopkgtest:~$ cat /etc/systemd/system/autopkgtest.service
[Unit]
Description=autopkgtest root shell on hvc1
ConditionPathExists=/dev/hvc1

[Service]
ExecStart=/bin/sh
StandardInput=tty-fail
StandardOutput=tty
StandardError=tty
TTYPath=/dev/hvc1
SendSIGHUP=yes
# ignore I/O errors on unusable hvc1
SuccessExitStatus=0 208 SIGHUP SIGINT SIGTERM SIGPIPE

[Install]
WantedBy=multi-user.target

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

With all mentioned before done I hit:
KVM_S390_MEM_OP failed: Cannot allocate memory
qemu-system-s390x: terminating on signal 15 from pid 56570
autopkgtest-virt-qemu: DBG: cleanup...
<VirtSubproc>: failure: timed out on client shared directory setup

A discussion turned out that pre qemu 2.7 s390 only had virtio-9p-ccw and no mapping to virtio-9p-pci. So for a while we might need to use the long opts for that.

In the following case the s390 version squashing the former and the new change:

@@ -532,10 +534,13 @@
             '-net', 'nic,model=virtio',
             '-net', 'user' + nic_opt,
             '-monitor', 'unix:%s/monitor,server,nowait' % workdir,
- '-serial', 'unix:%s/ttyS0,server,nowait' % workdir,
- '-serial', 'unix:%s/ttyS1,server,nowait' % workdir,
- '-virtfs',
- 'local,id=autopkgtest,path=%s,security_model=none,mount_tag=autopkgtest' % shareddir,
+ '-device', 'virtio-serial',
+ '-chardev', 'socket,path=%s/ttyS0,server,nowait,id=ttyS0' % workdir,
+ '-device', 'virtconsole,chardev=ttyS0,name=org.fedoraproject.console.ttyS0',
+ '-chardev', 'socket,path=%s/ttyS1,server,nowait,id=ttyS1' % workdir,
+ '-device', 'virtconsole,chardev=ttyS1,name=org.fedoraproject.console.ttyS1',
+ '-fsdev', 'local,id=autopkgtest,path=%s,security_model=none' % shareddir,
+ '-device', 'virtio-9p-ccw,fsdev=autopkgtest,mount_tag=autopkgtest',
             '-drive', 'file=%s,cache=unsafe,if=virtio,index=0' % overlay]
     for i, image in enumerate(args.image[1:]):
         argv.append('-drive')

That got me further again, but now I'm stopped at:
autopkgtest-virt-qemu: DBG: Copying host timezone America/New_York to VM
autopkgtest-virt-qemu: DBG: expect: "#"
autopkgtest-virt-qemu: DBG: expect: found ""b'#'""
autopkgtest-virt-qemu: DBG: expect: "/python"
autopkgtest-virt-qemu: DBG: expect: found ""b'/python'""
autopkgtest-virt-qemu: DBG: expect: "# "
autopkgtest-virt-qemu: DBG: expect: found ""b'# '""
autopkgtest-virt-qemu: DBG: expect: "# "
autopkgtest-virt-qemu: DBG: expect: found ""b'# '""
autopkgtest-virt-qemu: DBG: execute-timeout: /tmp/autopkgtest-virt-qemu.i1ulemwi/runcmd true
autopkgtest-virt-qemu: DBG: can connect to autopkgtest sh in VM
autopkgtest-virt-qemu: DBG: determine_normal_user: got user "ubuntu"
autopkgtest-virt-qemu: DBG: auxverb = ['/tmp/autopkgtest-virt-qemu.i1ulemwi/runcmd'], downtmp = None
autopkgtest-virt-qemu: DBG: execute-timeout: /tmp/autopkgtest-virt-qemu.i1ulemwi/runcmd mktemp --directory --tmpdir autopkgtest.XXXXXX
autopkgtest-virt-qemu: DBG: execute-timeout: /tmp/autopkgtest-virt-qemu.i1ulemwi/runcmd chmod 1777 /tmp/autopkgtest.MKTMnx
autopkgtest-virt-qemu: DBG: cleanup...
qemu-system-s390x: terminating on signal 15 from pid 56709
adt-run: DBG: TestbedFailure testbed gave exit status -13 after quit

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I hit the same - or very very similar - on ppc64el today.

After a:
adt-buildvm-ubuntu-cloud -a ppc64el -r zesty -s 20G -m http://ports.ubuntu.com/ubuntu-ports

Things are not working.

-serial there maps to hvc consoles.
The first is always the normal boot console and that is fine as it eventually is a login console.
But the root console on ttyS1 socket is not set up correctly.
The second -serial argument that autopkgtest adds is mapped to hvc1 by the guest.
That and the fact that the setup commands did not run correctly renders it useless.

To get it working let the guest spawn a root console on hvc1 and autpkgtest will works there:
To do so I went into the base image:
$ sudo kvm -m 2048 -smp 4 -nographic -net nic,model=virtio -net user,hostfwd=tcp::10022-:22 -drive file=/home/ubuntu/cpaelzer/adt-zesty-ppc64el-cloud.img,cache=unsafe,if=virtio,index=0

There I placed a modified version of the autpkgtest.service file (hvc1 instead of ttyS1)
[Unit]
Description=autopkgtest root shell on hvc1
ConditionPathExists=/dev/hvc1

[Service]
ExecStart=/bin/sh
StandardInput=tty-fail
StandardOutput=tty
StandardError=tty
TTYPath=/dev/hvc1
SendSIGHUP=yes
# ignore I/O errors on unusable hvc1
SuccessExitStatus=0 208 SIGHUP SIGINT SIGTERM SIGPIPE

[Install]
WantedBy=multi-user.target

And finally I had to explicitly enable it in this case:
$ systemctl enable autopkgtest

When reproducing with the double -serial arg set in direct invocation I could confirm that htere is now a /bin/sh spawned on hvc1

After shutting down from that autopkgtest worked on ppc64el

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Please let me know if you'd want me to split the ppc64 case in a different bug

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in autopkgtest (Ubuntu):
status: New → Confirmed
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I'm also having console access issues with ppc64el. autopkgtest fails saying there was no root login on ttyS1.

summary: - failing console access on s390x
+ failing console access on s390x, ppc64el
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The workaround mentioned in comment #6 seems to still work when applying it in the guest.

I get the service to start after changing ttyS1 to hvc1

systemctl status autopkgtest.service
● autopkgtest.service - autopkgtest root shell on hvc1
   Loaded: loaded (/etc/systemd/system/autopkgtest.service; enabled; vendor pres
   Active: active (running) since Tue 2018-11-27 11:45:05 UTC; 6s ago
 Main PID: 1246 (sh)
    Tasks: 1 (limit: 552)
   Memory: 768.0K
   CGroup: /system.slice/autopkgtest.service
           └─1246 /bin/sh

Nov 27 11:45:05 autopkgtest systemd[1]: Started autopkgtest root shell on hvc1.

And I get into the then ready root login on ttyS1 socket.
$ sudo nc -U /tmp/ttyS1
# id
id
uid=0(root) gid=0(root) groups=0(root)

But while those elements still work and it does not trigger the same "no root login" issue anymore. It does no more (Disco) work to drive the autopkgtest.
It seems to "just" hang.
Log stays empty and nothing else happens.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Running with debug enabled shows that it actually runs fine until it tries to create a temp directory:

Something around systemd-coredump does not sound good.

autopkgtest-virt-qemu: DBG: execute-timeout: /tmp/autopkgtest-qemu.us495jpz/runcmd true
autopkgtest-virt-qemu: DBG: can connect to autopkgtest sh in VM
autopkgtest-virt-qemu: DBG: determine_normal_user: got user "systemd-coredump"
autopkgtest-virt-qemu: DBG: auxverb = ['/tmp/autopkgtest-qemu.us495jpz/runcmd'], downtmp = None
autopkgtest-virt-qemu: DBG: execute-timeout: /tmp/autopkgtest-qemu.us495jpz/runcmd mktemp --directory --tmpdir autopkgtest.XXXXXX

In this state I can't login anymore at all

I found a few processes like these:
$ /usr/bin/python3 /tmp/autopkgtest-qemu.us495jpz/runcmd mktemp --directory --tmpdir autopkgtest.XXXXXX
Which seem to be the host counterpart to the hanging guest processes I think.

This makes this even less working :-/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.