backport arm64 THP improvements from 6.9

Bug #2059316 reported by dann frazier
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
In Progress
Undecided
dann frazier

Bug Description

Initial support for multi-size THP landed upstream in v6.8. In the 6.9 merge window, 2 other series have landed that show significant performance improvements on arm64

mm/memory: optimize fork() with PTE-mapped THP
  https://lkml.iu.edu/hypermail/linux/kernel/2401.3/02766.html

Transparent Contiguous PTEs for User Mappings:
 https://lwn.net/Articles/962330/

On an Ampere AltraMax system w/ 4K page size, kernel builds in a tmpfs are reduced from 6m30s to 5m17s, a ~19% improvement.

It has been reported that this can have a *10x* improvement for certain GPU workloads on ARM:

https://lwn.net/Articles/954094/

dann frazier (dannf)
Changed in linux (Ubuntu):
assignee: nobody → dann frazier (dannf)
dann frazier (dannf)
Changed in linux (Ubuntu):
status: New → In Progress
Revision history for this message
dann frazier (dannf) wrote :

I've build-tested on all architectures in this PPA:
  https://launchpad.net/~dannf/+archive/ubuntu/mthp/+packages

I manually tested on a few systems, timing a full kernel build w/ the Ubuntu config in a tmpfs (both to stress the system, and look for performance differences).

- ppc64el / Power9 - no performance difference
- x86 - AMD EPYC/Naples - 11% improvement
- Ampere AltraMax (-generic) - 19% improvement

description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.