Azure: Fix perf regression: remove rx_cqes, tx_cqes counters for MANA

Bug #2022940 reported by Tim Gardner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Fix Released
Undecided
Unassigned
Lunar
Fix Released
Medium
Tim Gardner

Bug Description

SRU Justification

[Impact]

net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=1919b39fc6eabb9a6f9a51706ff6d03865f5df29

It resolves a big perf regression.

More details:
The apc->eth_stats.rx_cqes is one per NIC (vport), and it's on the
frequent and parallel code path of all queues. So, r/w into this
single shared variable by many threads on different CPUs creates a
lot caching and memory overhead, hence perf regression. And, it's
not accurate due to the high volume concurrent r/w.

For example, a workload is iperf with 128 threads, and with RPS
enabled. We saw perf regression of 25% with the previous patch
adding the counters. And this patch eliminates the regression.

Since the error path of mana_poll_rx_cq() already has warnings, so
keeping the counter and convert it to a per-queue variable is not
necessary. So, just remove this counter from this high frequency
code path.

Also, remove the tx_cqes counter for the same reason. We have
warnings & other counters for errors on that path, and don't need
to count every normal cqe processing.

[Test Plan]

MSFT tested

[Regression potential]

Counters are disappearing that may be in use by user space programs.

[Other Info]

SF: #00361807

Tim Gardner (timg-tpi)
affects: linux (Ubuntu) → linux-azure (Ubuntu)
Changed in linux-azure (Ubuntu Lunar):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → Medium
status: New → In Progress
Changed in linux-azure (Ubuntu):
status: New → Fix Released
Tim Gardner (timg-tpi)
Changed in linux-azure (Ubuntu Lunar):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/6.2.0-1009.9 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar' to 'verification-done-lunar'. If the problem still exists, change the tag 'verification-needed-lunar' to 'verification-failed-lunar'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux-azure verification-needed-lunar
Tim Gardner (timg-tpi)
tags: added: verification-done-lunar
removed: verification-needed-lunar
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (223.9 KiB)

This bug was fixed in the package linux-azure - 6.2.0-1009.9

---------------
linux-azure (6.2.0-1009.9) lunar; urgency=medium

  * lunar/linux-azure: 6.2.0-1009.9 -proposed tracker (LP: #2026476)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis

  * Azure: Fix lockup in swiotlb when used as a CVM (LP: #2026736)
    - swiotlb: remove swiotlb_max_segment
    - swiotlb: fix the deadlock in swiotlb_do_find_slots
    - swiotlb: use wrap_area_index() instead of open-coding it
    - swiotlb: fix slot alignment checks
    - swiotlb: fix a braino in the alignment check fix

  * [Azure] Fix VM crash/hang issues due to fast VF add/remove events
    (LP: #2023071) // Case [Azure] Fix VM crash/hang issues due to fast VF
    add/remove events (LP: #2023594)
    - PCI: hv: Fix a race condition bug in hv_pci_query_relations()
    - PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
    - PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
    - Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"
    - PCI: hv: Add a per-bus mutex state_lock
    - PCI: hv: Use async probing to reduce boot time

  * Azure: Fix perf regression: remove rx_cqes, tx_cqes counters for MANA
    (LP: #2022940)
    - net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters

  * [Azure][MANA][VLANTagging] Support for VLAN Tagging for MANA (LP: #2023695)
    - net: mana: Add support for vlan tagging

  [ Ubuntu: 6.2.0-27.28 ]

  * lunar/linux: 6.2.0-27.28 -proposed tracker (LP: #2026488)
  * Packaging resync (LP: #1786013)
    - [Packaging] resync update-dkms-versions helper
    - [Packaging] update annotations scripts
  * CVE-2023-2640 // CVE-2023-32629
    - Revert "UBUNTU: SAUCE: overlayfs: handle idmapped mounts in
      ovl_do_(set|remove)xattr"
    - Revert "UBUNTU: SAUCE: overlayfs: Skip permission checking for
      trusted.overlayfs.* xattrs"
    - SAUCE: overlayfs: default to userxattr when mounted from non initial user
      namespace
  * UNII-4 5.9G Band support request on 8852BE (LP: #2023952)
    - wifi: rtw89: 8851b: add 8851B basic chip_info
    - wifi: rtw89: introduce realtek ACPI DSM method
    - wifi: rtw89: regd: judge UNII-4 according to BIOS and chip
    - wifi: rtw89: support U-NII-4 channels on 5GHz band
  * Disable hv-kvp-daemon if /dev/vmbus/hv_kvp is not present (LP: #2024900)
    - [Packaging] disable hv-kvp-daemon if needed
  * A deadlock issue in scsi rescan task while resuming from S3 (LP: #2018566)
    - ata: libata-scsi: Avoid deadlock on rescan after device resume
  * [SRU] Intel Sapphire Rapids HBM support needs CONFIG_NUMA_EMU (LP: #2008745)
    - [Config] Intel Sapphire Rapids HBM support needs CONFIG_NUMA_EMU
  * Lunar update: v6.2.15 upstream stable release (LP: #2025067)
    - ASOC: Intel: sof_sdw: add quirk for Intel 'Rooks County' NUC M15
    - ASoC: Intel: soc-acpi: add table for Intel 'Rooks County' NUC M15
    - ASoC: soc-pcm: fix hw->formats cleared by soc_pcm_hw_init() for dpcm
    - x86/hyperv: Block root partition functionality in a Confidential VM
    - ASoC: amd: yc: Add DMI entries to support Victus by HP Laptop 16-e1xxx
      (8A22)
    - iio:...

Changed in linux-azure (Ubuntu Lunar):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.