Azure: Fix perf regression: remove rx_cqes, tx_cqes counters for MANA
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-azure (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Lunar |
Fix Released
|
Medium
|
Tim Gardner |
Bug Description
SRU Justification
[Impact]
net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters
https:/
It resolves a big perf regression.
More details:
The apc->eth_
frequent and parallel code path of all queues. So, r/w into this
single shared variable by many threads on different CPUs creates a
lot caching and memory overhead, hence perf regression. And, it's
not accurate due to the high volume concurrent r/w.
For example, a workload is iperf with 128 threads, and with RPS
enabled. We saw perf regression of 25% with the previous patch
adding the counters. And this patch eliminates the regression.
Since the error path of mana_poll_rx_cq() already has warnings, so
keeping the counter and convert it to a per-queue variable is not
necessary. So, just remove this counter from this high frequency
code path.
Also, remove the tx_cqes counter for the same reason. We have
warnings & other counters for errors on that path, and don't need
to count every normal cqe processing.
[Test Plan]
MSFT tested
[Regression potential]
Counters are disappearing that may be in use by user space programs.
[Other Info]
SF: #00361807
affects: | linux (Ubuntu) → linux-azure (Ubuntu) |
Changed in linux-azure (Ubuntu Lunar): | |
assignee: | nobody → Tim Gardner (timg-tpi) |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in linux-azure (Ubuntu): | |
status: | New → Fix Released |
Changed in linux-azure (Ubuntu Lunar): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-lunar removed: verification-needed-lunar |
This bug is awaiting verification that the linux-azure/ 6.2.0-1009. 9 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification- needed- lunar' to 'verification- done-lunar' . If the problem still exists, change the tag 'verification- needed- lunar' to 'verification- failed- lunar'.
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/ /wiki.ubuntu. com/Testing/ EnableProposed for documentation how to enable and use -proposed. Thank you!