linux-cloud-tools-common: Ensure hv-kvp-daemon.service starts before walinuxagent.service

Bug #1739107 reported by David Coronel
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Eric Desrochers
Xenial
Fix Released
Medium
Eric Desrochers
Zesty
Won't Fix
Undecided
Unassigned
Artful
Won't Fix
Undecided
Unassigned
Bionic
Fix Released
Medium
Eric Desrochers

Bug Description

This is a request to make a change in the hv-kvp-daemon systemd service which is part of the linux-cloud-tools-common package to ensure the hv-kvp-daemon service starts before the walinuxagent service. The default dependencies make hv-kvp-daemon wait until the whole system is up before it can start.

Currently the /lib/systemd/system/hv-kvp-daemon.service file looks like this:

====================
# On Azure/Hyper-V systems start the hv_kvp_daemon
#
# author "Andy Whitcroft <email address hidden>"
[Unit]
Description=Hyper-V KVP Protocol Daemon
ConditionVirtualization=microsoft

[Service]
ExecStart=/usr/sbin/hv_kvp_daemon -n

[Install]
WantedBy=multi-user.target
====================

The suggested modification is to make the [Unit] section look like this:

[Unit]
Description=Hyper-V KVP Protocol Daemon
ConditionVirtualization=microsoft
DefaultDependencies=no
After=systemd-remount-fs.service
Before=shutdown.target cloud-init-local.service walinuxagent.service
Conflicts=shutdown.target
RequiresMountsFor=/var/lib/hyperv

The hv-kvp-daemon service is not currently part of the critical-chain:

$ systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

graphical.target @10.809s
└─multi-user.target @10.723s
└─ephemeral-disk-warning.service @10.538s +31ms
└─cloud-config.service @8.249s +2.252s
└─basic.target @8.044s
└─sockets.target @8.019s
└─snapd.socket @7.692s +264ms
└─sysinit.target @6.719s
└─cloud-init.service @5.803s +842ms
└─networking.service @5.137s +612ms
└─network-pre.target @5.074s
└─cloud-init-local.service @2.257s +2.783s
└─systemd-remount-fs.service @1.368s +656ms
└─systemd-journald.socket @1.218s
└─-.mount @649ms
└─system.slice @653ms
└─-.slice @649ms

In an Azure VM, the current startup time of my test is:
$ systemd-analyze
Startup finished in 10.375s (kernel) + 12.352s (userspace) = 22.728s

After making the suggested change, the startup time is similar:

$ systemd-analyze
Startup finished in 9.759s (kernel) + 11.867s (userspace) = 21.627s

And the service is now in the critical-chain:

$ systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

graphical.target @10.666s
└─multi-user.target @10.636s
└─ephemeral-disk-warning.service @10.556s +36ms
└─cloud-config.service @8.423s +2.095s
└─basic.target @8.124s
└─sockets.target @8.101s
└─lxd.socket @7.677s +326ms
└─sysinit.target @6.755s
└─cloud-init.service @5.814s +908ms
└─networking.service @5.111s +651ms
└─network-pre.target @5.087s
└─cloud-init-local.service @2.345s +2.707s
└─hv-kvp-daemon.service @2.316s
└─systemd-remount-fs.service @1.253s +680ms
└─system.slice @1.225s
└─-.slice @650ms

The ConditionVirtualization=microsoft line makes it so that this doesn't affect non microsoft virtualization environments (ie. qemu, kvm, vmware, xen, etc.) by checking whether the system is executed in a virtualized environment and optionally test whether it is a specific implementation, in this case "microsoft" for Hyper-V.

https://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html#
microsoft = Hyper-V, also known as Viridian or Windows Server Virtualization

Related branches

CVE References

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1739107

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Eric Desrochers (slashd)
tags: added: sts
tags: added: azure
Revision history for this message
Eric Desrochers (slashd) wrote :

Waiting for a confirmation from Azure team to validate that the proposal fix for all Ubuntu stable releases works as expected with their internal validation before submitting to kernel-team ML.

Regards,
Eric

Revision history for this message
Eric Desrochers (slashd) wrote :

0001-UBUNTU-Debian-hyper-v-Ensure-that-hv-kvp-daemon.serv.patch

tags: added: patch
Revision history for this message
Eric Desrochers (slashd) wrote :

Microsoft Azure didn't come back to us yet about comment #3.

I will gladly resume the work on this LP once we get some updates. For now I've set the bug to "incomplete" as it can only be tested internally by Microsoft.

Regards,
Eric

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Changed in linux (Ubuntu Xenial):
status: New → Incomplete
Changed in linux (Ubuntu Zesty):
status: New → Incomplete
Changed in linux (Ubuntu Artful):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: Triaged → Incomplete
Revision history for this message
Andy (andyliuliming) wrote :

@Eric Desrochers,
I'm from azure team.
and the xenial bionic are tested. and it works well.
will test the zesty and artful today and post the result here.

Changed in linux (Ubuntu):
status: Incomplete → In Progress
assignee: nobody → Andy (andyliuliming)
Andy (andyliuliming)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Andy (andyliuliming)
Changed in linux (Ubuntu Zesty):
assignee: nobody → Andy (andyliuliming)
Changed in linux (Ubuntu Artful):
assignee: nobody → Andy (andyliuliming)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Andy (andyliuliming)
Changed in linux (Ubuntu Xenial):
status: Incomplete → In Progress
Changed in linux (Ubuntu Zesty):
status: Incomplete → In Progress
Changed in linux (Ubuntu Artful):
status: Incomplete → In Progress
Changed in linux (Ubuntu Bionic):
status: Incomplete → In Progress
Revision history for this message
Andy (andyliuliming) wrote :

verified, all works.

could you please help "submitting to kernel-team ML", through I do not know what does it mean :)
maybe it's a process before merging the PRs :)

Changed in linux (Ubuntu):
assignee: Andy (andyliuliming) → nobody
Changed in linux (Ubuntu Xenial):
assignee: Andy (andyliuliming) → nobody
Changed in linux (Ubuntu Zesty):
assignee: Andy (andyliuliming) → nobody
Changed in linux (Ubuntu Bionic):
assignee: Andy (andyliuliming) → nobody
Changed in linux (Ubuntu Artful):
assignee: Andy (andyliuliming) → nobody
Revision history for this message
Eric Desrochers (slashd) wrote :

Zesty and Artful is not needed, as both release are no longer supported version.

Xenial & Bionic testing as you did will suffice for now.

Eric Desrochers (slashd)
Changed in linux (Ubuntu Artful):
status: In Progress → Won't Fix
Changed in linux (Ubuntu Zesty):
status: In Progress → Won't Fix
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
Eric Desrochers (slashd)
description: updated
Eric Desrochers (slashd)
description: updated
Eric Desrochers (slashd)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Eric Desrochers (slashd)
Changed in linux (Ubuntu):
assignee: nobody → Eric Desrochers (slashd)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Eric Desrochers (slashd)
Revision history for this message
Eric Desrochers (slashd) wrote :

The patch has received 2 ACKs from Ubuntu kernel team and has been applied to cosmic master-next branch and unstable master branch so far.

Revision history for this message
Eric Desrochers (slashd) wrote :

APPLIED: [PATCH][XBC] UBUNTU: [Debian] hyper-v -- Ensure that hv-kvp-daemon.service starts before walinuxagent.service

to xenial/master-next and bionic/master-next

Marcelo Cerri (mhcerri)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (12.2 KiB)

This bug was fixed in the package linux - 4.17.0-9.10

---------------
linux (4.17.0-9.10) cosmic; urgency=medium

  * linux: 4.17.0-9.10 -proposed tracker (LP: #1787988)

  * Cosmic update to 4.17.17 stable release (LP: #1787973)
    - x86/speculation/l1tf: Exempt zeroed PTEs from inversion
    - Linux 4.17.17

  * Cosmic update to 4.17.16 stable release (LP: #1787972)
    - x86/l1tf: Fix build error seen if CONFIG_KVM_INTEL is disabled
    - x86: i8259: Add missing include file
    - x86/platform/UV: Mark memblock related init code and data correctly
    - x86/mm/pti: Clear Global bit more aggressively
    - xen/pv: Call get_cpu_address_sizes to set x86_virt/phys_bits
    - x86/mm: Disable ioremap free page handling on x86-PAE
    - kbuild: verify that $DEPMOD is installed
    - crypto: ccree - fix finup
    - crypto: ccree - fix iv handling
    - crypto: ccp - Check for NULL PSP pointer at module unload
    - crypto: ccp - Fix command completion detection race
    - crypto: x86/sha256-mb - fix digest copy in sha256_mb_mgr_get_comp_job_avx2()
    - crypto: vmac - require a block cipher with 128-bit block size
    - crypto: vmac - separate tfm and request context
    - crypto: blkcipher - fix crash flushing dcache in error path
    - crypto: ablkcipher - fix crash flushing dcache in error path
    - crypto: skcipher - fix aligning block size in skcipher_copy_iv()
    - crypto: skcipher - fix crash flushing dcache in error path
    - ioremap: Update pgtable free interfaces with addr
    - x86/mm: Add TLB purge to free pmd/pte page interfaces
    - Linux 4.17.16

  * Cosmic update to 4.17.16 stable release (LP: #1787972) // CVE-2018-9363
    - Bluetooth: hidp: buffer overflow in hidp_process_report

  * linux-cloud-tools-common: Ensure hv-kvp-daemon.service starts before
    walinuxagent.service (LP: #1739107)
    - [Debian] hyper-v -- Ensure that hv-kvp-daemon.service starts before
      walinuxagent.service

  * Miscellaneous Ubuntu changes
    - [Packaging] retpoline -- fix temporary filenaming

linux (4.17.0-8.9) cosmic; urgency=medium

  * linux: 4.17.0-8.9 -proposed tracker (LP: #1787259)

  * Cosmic update to v4.17.15 stable release (LP: #1787257)
    - parisc: Enable CONFIG_MLONGCALLS by default
    - parisc: Define mb() and add memory barriers to assembler unlock sequences
    - Mark HI and TASKLET softirq synchronous
    - stop_machine: Disable preemption after queueing stopper threads
    - sched/deadline: Update rq_clock of later_rq when pushing a task
    - zram: remove BD_CAP_SYNCHRONOUS_IO with writeback feature
    - xen/netfront: don't cache skb_shinfo()
    - bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path
    - bpf, sockmap: fix bpf_tcp_sendmsg sock error handling
    - scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management
      enabled
    - scsi: qla2xxx: Fix memory leak for allocating abort IOCB
    - init: rename and re-order boot_cpu_state_init()
    - root dentries need RCU-delayed freeing
    - make sure that __dentry_kill() always invalidates d_seq, unhashed or not
    - fix mntput/mntput race
    - fix __legitimize_mnt()/mntput() race
    - ARM: dts: imx6sx: fix irq for pcie bridge
  ...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Andy (andyliuliming)
tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Andy (andyliuliming)
tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (32.9 KiB)

This bug was fixed in the package linux - 4.15.0-34.37

---------------
linux (4.15.0-34.37) bionic; urgency=medium

  * linux: 4.15.0-34.37 -proposed tracker (LP: #1788744)

  * Bionic update: upstream stable patchset 2018-08-09 (LP: #1786352)
    - MIPS: c-r4k: Fix data corruption related to cache coherence
    - MIPS: ptrace: Expose FIR register through FP regset
    - MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
    - KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
    - affs_lookup(): close a race with affs_remove_link()
    - fs: don't scan the inode cache before SB_BORN is set
    - aio: fix io_destroy(2) vs. lookup_ioctx() race
    - ALSA: timer: Fix pause event notification
    - do d_instantiate/unlock_new_inode combinations safely
    - mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
    - mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
    - mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
    - libata: Blacklist some Sandisk SSDs for NCQ
    - libata: blacklist Micron 500IT SSD with MU01 firmware
    - xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
    - drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
    - arm64: lse: Add early clobbers to some input/output asm operands
    - powerpc/64s: Clear PCR on boot
    - IB/hfi1: Use after free race condition in send context error path
    - IB/umem: Use the correct mm during ib_umem_release
    - idr: fix invalid ptr dereference on item delete
    - Revert "ipc/shm: Fix shmat mmap nil-page protection"
    - ipc/shm: fix shmat() nil address after round-down when remapping
    - mm/kasan: don't vfree() nonexistent vm_area
    - kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
    - kasan: fix memory hotplug during boot
    - kernel/sys.c: fix potential Spectre v1 issue
    - KVM: s390: vsie: fix < 8k check for the itdba
    - KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
    - kvm: x86: IA32_ARCH_CAPABILITIES is always supported
    - powerpc/64s: Improve RFI L1-D cache flush fallback
    - powerpc/pseries: Restore default security feature flags on setup
    - powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
    - MIPS: generic: Fix machine compatible matching
    - mac80211: mesh: fix wrong mesh TTL offset calculation
    - ARC: Fix malformed ARC_EMUL_UNALIGNED default
    - ptr_ring: prevent integer overflow when calculating size
    - arm64: dts: rockchip: fix rock64 gmac2io stability issues
    - arm64: dts: rockchip: correct ep-gpios for rk3399-sapphire
    - libata: Fix compile warning with ATA_DEBUG enabled
    - selftests: sync: missing CFLAGS while compiling
    - selftest/vDSO: fix O=
    - selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
    - selftests: memfd: add config fragment for fuse
    - ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
    - ARM: OMAP3: Fix prm wake interrupt for resume
    - ARM: OMAP2+: Fix sar_base inititalization for HS omaps
    - ARM: OMAP1: clock: Fix debugfs_create_*() usage
    - tls: retrun the correct IV in getsockopt
    - xhci: workaround for AMD Promontory disabled ports w...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.4.0-135.161

---------------
linux (4.4.0-135.161) xenial; urgency=medium

  * linux: 4.4.0-135.161 -proposed tracker (LP: #1788766)

  * [Regression] APM Merlin boards fail to recover link after interface down/up
    (LP: #1785739)
    - net: phylib: fix interrupts re-enablement in phy_start
    - net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT

  * qeth: don't clobber buffer on async TX completion (LP: #1786057)
    - s390/qeth: don't clobber buffer on async TX completion

  * nvme: avoid cqe corruption (LP: #1788035)
    - nvme: avoid cqe corruption when update at the same time as read

  * CacheFiles: Error: Overlong wait for old active object to go away.
    (LP: #1776254)
    - cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag
    - cachefiles: Wait rather than BUG'ing on "Unexpected object collision"

  * fscache cookie refcount updated incorrectly during fscache object allocation
    (LP: #1776277) // fscache cookie refcount updated incorrectly during fscache
    object allocation (LP: #1776277)
    - fscache: Fix reference overput in fscache_attach_object() error handling

  * FS-Cache: Assertion failed: FS-Cache: 6 == 5 is false (LP: #1774336)
    - Revert "UBUNTU: SAUCE: CacheFiles: fix a read_waiter/read_copier race"
    - fscache: Allow cancelled operations to be enqueued
    - cachefiles: Fix refcounting bug in backing-file read monitoring

  * linux-cloud-tools-common: Ensure hv-kvp-daemon.service starts before
    walinuxagent.service (LP: #1739107)
    - [Debian] hyper-v -- Ensure that hv-kvp-daemon.service starts before
      walinuxagent.service

 -- Khalid Elmously <email address hidden> Sun, 26 Aug 2018 23:56:50 -0400

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Guillaume Penin (guillaume-penin) wrote :

Hi all,

I suspect that this fix is now preventing unattended-upgrades (in shutdown down) to upgrade the package linux-cloud-tools-common.

The update process hangs during "Preparing to unpack". The server restarts after the unattended-upgrades service timeout expires.

Package state after reboot :

iFR linux-cloud-tools-common 4.15.0-34.37 all Linux kernel version specific cloud tools for version 4.15.0

The current impact is very important as all security updates are blocked until you manually fix each server with :

dpkg --configure -a
apt install --only-upgrade linux-cloud-tools-common

As a simple straightforward fix, replacing :

Before=shutdown.target cloud-init-local.service walinuxagent.service

with :

Before=shutdown.target walinuxagent.service

makes the package upgradable during shutdown.

I have reported this bug in another bug report : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1796376

Thanks for your help.

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.