Jammy Charmed OpenStack deployment fails over connectivity issues when using converged OVS bridge for control and data planes

Bug #1978820 reported by Itai Levy
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
Jammy
Fix Released
Medium
Unassigned

Bug Description

Platform: OpenStack Yoga, Ubuntu 22.04 Jammy, Kernel 5.15.0-37-generic

Charmed Openstack deployment with HW Offload over Jammy series will look ok until Vault initialization phase, then after initializing Vault all DB-related apps will end up in block/error state over "Failed to connect to MYSQL".
Connectivity testing between DB containers located on different nodes will show there is unexplained sporadic packet loss preventing proper communication between the DB related apps.

This will happen when the following conditions are met:
1. Control plane (oam, internal spaces) is configured as vlan interfaces on the same OVS bridge used for data plane (over high speed NIC with HW Offload capabilities).
2. OVS was set with HW offload=true (will happen by OVN chrams after Vault initialization)
3. NIC was not yet set to "switchdev" mode (netplan file will be created by OVN chrams after Vault initialization, however will take affect only after node is rebooted)

The root cause is the following missing kernel patch:
https://patchwork<email address hidden>/

To reproduce:
Deploy charmed openstack with HW offload while using control plane on the high speed NIC OVS bridge. Before initializing Vault login to one of the innoDB instances and ping the other 2 instances - all ok. Manually enable OVS HW Offload, ping will become inconsistent.

Workaround:
After the deployment bring-up phase, BEFORE enabling Vault, login to the nodes and manually create 150-charm-ovn.yaml (example below). Then reboot one node after another. When nodes recover proceed with Vault initialization to complete the deployment.

#root@node3:/home/ubuntu# cat /etc/netplan/150-charm-ovn.yaml
###############################################################################
# [ WARNING ]
# Configuration file maintained by Juju. Local changes may be overwritten.
# Config managed by ovn-chassis charm
###############################################################################
network:
  version: 2
  ethernets:
    ens1f0:
      virtual-function-count: 8
      embedded-switch-mode: switchdev
      delay-virtual-functions-rebind: true

    ens1f1:
      virtual-function-count: 8
      embedded-switch-mode: switchdev
      delay-virtual-functions-rebind: true
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jul 4 10:46 seq
 crw-rw---- 1 root audio 116, 33 Jul 4 10:46 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
CRDA: N/A
CasperMD5CheckResult: unknown
DistroRelease: Ubuntu 22.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: HP ProLiant DL360 Gen9
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl icp
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-40-generic root=UUID=db1801a9-daa1-4386-b2ec-c65a40bc5dd3 ro intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1
ProcVersionSignature: Ubuntu 5.15.0-40.43-generic 5.15.35
RelatedPackageVersions:
 linux-restricted-modules-5.15.0-40-generic N/A
 linux-backports-modules-5.15.0-40-generic N/A
 linux-firmware 20220329.git681281e4-0ubuntu3.2
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy uec-images
Uname: Linux 5.15.0-40-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 03/05/2015
dmi.bios.release: 1.32
dmi.bios.vendor: HP
dmi.bios.version: P89
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.ec.firmware.release: 2.53
dmi.modalias: dmi:bvnHP:bvrP89:bd03/05/2015:br1.32:efr2.53:svnHP:pnProLiantDL360Gen9:pvr:cvnHP:ct23:cvr:sku755258-B21:
dmi.product.family: ProLiant
dmi.product.name: ProLiant DL360 Gen9
dmi.product.sku: 755258-B21
dmi.sys.vendor: HP

Frode Nordahl (fnordahl)
no longer affects: plan (Ubuntu)
Revision history for this message
Itai Levy (etlvnvda) wrote : AudioDevicesInUse.txt

apport information

tags: added: apport-collected jammy uec-images
description: updated
Revision history for this message
Itai Levy (etlvnvda) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Itai Levy (etlvnvda) wrote : Lspci.txt

apport information

Revision history for this message
Itai Levy (etlvnvda) wrote : Lspci-vt.txt

apport information

Revision history for this message
Itai Levy (etlvnvda) wrote : Lsusb.txt

apport information

Revision history for this message
Itai Levy (etlvnvda) wrote : Lsusb-t.txt

apport information

Revision history for this message
Itai Levy (etlvnvda) wrote : Lsusb-v.txt

apport information

Revision history for this message
Itai Levy (etlvnvda) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Itai Levy (etlvnvda) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Itai Levy (etlvnvda) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Itai Levy (etlvnvda) wrote :

output file for "apport-collect 1978820" command is attached (apport.linux.vmems9ns.apport)

Revision history for this message
Itai Levy (etlvnvda) wrote :
Revision history for this message
Frode Nordahl (fnordahl) wrote :

The single commit referenced in the description [0] applies as a clean cherry-pick to the 5.15 kernel.

0: https://patchwork<email address hidden>/

Changed in linux (Ubuntu):
status: New → Confirmed
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
importance: Undecided → Medium
status: New → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-43.46 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-jammy
Itai Levy (etlvnvda)
tags: added: verification-done-jammy
removed: apport-collected jammy uec-images verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.6 KiB)

This bug was fixed in the package linux - 5.15.0-43.46

---------------
linux (5.15.0-43.46) jammy; urgency=medium

  * jammy/linux: 5.15.0-43.46 -proposed tracker (LP: #1981243)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2022.07.11)

  * nbd: requests can become stuck when disconnecting from server with qemu-nbd
    (LP: #1896350)
    - nbd: don't handle response without a corresponding request message
    - nbd: make sure request completion won't concurrent
    - nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
    - nbd: fix io hung while disconnecting device

  * Ubuntu 22.04 and 20.04 DPC Fixes for Failure Cases of DownPort Containment
    events (LP: #1965241)
    - PCI/portdrv: Rename pm_iter() to pcie_port_device_iter()
    - PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
    - [Config] Enable config option CONFIG_PCIE_EDR

  * [SRU] Ubuntu 22.04 Feature Request-Add support for a NVMe-oF-TCP CDC Client
    - TP 8010 (LP: #1948626)
    - nvme: add CNTRLTYPE definitions for 'identify controller'
    - nvme: send uevent on connection up
    - nvme: expose cntrltype and dctype through sysfs

  * [UBUNTU 22.04] Kernel oops while removing device from cio_ignore list
    (LP: #1980951)
    - s390/cio: derive cdev information only for IO-subchannels

  * Jammy Charmed OpenStack deployment fails over connectivity issues when using
    converged OVS bridge for control and data planes (LP: #1978820)
    - net/mlx5e: TC NIC mode, fix tc chains miss table

  * Hairpin traffic does not work with centralized NAT gw (LP: #1967856)
    - net: openvswitch: fix misuse of the cached connection on tuple changes

  * alsa: asoc: amd: the internal mic can't be dedected on yellow carp machines
    (LP: #1980700)
    - ASoC: amd: Add driver data to acp6x machine driver
    - ASoC: amd: Add support for enabling DMIC on acp6x via _DSD

  * AMD ACP 6.x DMIC Supports (LP: #1949245)
    - ASoC: amd: add Yellow Carp ACP6x IP register header
    - ASoC: amd: add Yellow Carp ACP PCI driver
    - ASoC: amd: add acp6x init/de-init functions
    - ASoC: amd: add platform devices for acp6x pdm driver and dmic driver
    - ASoC: amd: add acp6x pdm platform driver
    - ASoC: amd: add acp6x irq handler
    - ASoC: amd: add acp6x pdm driver dma ops
    - ASoC: amd: add acp6x pci driver pm ops
    - ASoC: amd: add acp6x pdm driver pm ops
    - ASoC: amd: enable Yellow carp acp6x drivers build
    - ASoC: amd: create platform device for acp6x machine driver
    - ASoC: amd: add YC machine driver using dmic
    - ASoC: amd: enable Yellow Carp platform machine driver build
    - ASoC: amd: fix uninitialized variable in snd_acp6x_probe()
    - [Config] Enable AMD ACP 6 DMIC Support

  * [UBUNTU 20.04] Include patches to avoid self-detected stall with Secure
    Execution (LP: #1979296)
    - KVM: s390: pv: add macros for UVC CC values
    - KVM: s390: pv: avoid stalls when making pages secure

  * [22.04 FEAT] KVM: Attestation support for Secure Execution (crypto)
    (LP: #1959973)
    - drivers/s390/char: Add Ultravisor io device
    - s390/uv_uapi: depend on CONFIG_S390
    - [Co...

Read more...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Itai Levy (etlvnvda) wrote :

I can verify the fix is working with the released 5.15.0-43.46

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gkeop-5.15/5.15.0-1003.5~20.04.2 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.