Comment 91 for bug 1833281

Revision history for this message
In , bugzilla (bugzilla-linux-kernel-bugs) wrote :

I primarily test by building webkitgtk [1], and I experience the same loss of system responsiveness whether / is ext4 or Btrfs. But I do see a difference in top and iotop.
https://drive.google.com/open?id=12jpQeskPsvHmfvDjWSPOwIWSz09JIUlk

This is an extreme case of refaulting, it's out of memory and swap, and since kswapd and btrfs threads are using a lot of CPU I'm guessing the faults are a mix of anonymous pages and file pages. At this point the system is really lost which is why the UX is the same with ext4 and btrfs; but behind the scenes it does seem more is going on. There might be other workloads which aren't as extreme, thereby exposing the difference. Two possible sources of the heavy CPU for btrfs threads: decompression, and checksumming. If it's true there is near constant reclaim happening, it's not just a simple minimum 4K read but rather a 128K minimum because all Btrfs compressed files use 128K extent size; is then decompressed, and then requires reading csum tree and computing csum on the read to compare. Ordinarily this is cheap but in this situation possibly it's resulting in a lot of extra congestion, but this is the limit of my knowledge so it's just speculation.

Btrfs write amplification is a known issue (wandering trees problem). But that appears to not be the issue in this example.

It might be this problem is better dealt with by cgroupsv2 to protect certain tasks from reclaim, and thus reduce the problem on any file system. But Btrfs alone (for now) does have more sophisticated cgroupvs2 IO isolation control as well.
https://www.spinics.net/lists/cgroups/msg24743.html

The upstream GNOME and KDE developers are aware of the loss of responsiveness problem and have done quite a lot of preliminary work in GNOME 3.34 with more work on the way.
https://blogs.gnome.org/benzea/2019/10/01/gnome-3-34-is-now-managed-using-systemd/

You can today take advantage of this cgroupsv2 work by running resource hungry tasks as a systemd user unit in Fedora 31.
https://blogs.gnome.org/benzea/2019/10/01/gnome-3-34-is-now-managed-using-systemd/#comment-14833

I expect in the next 6-12 months (it's a guesstimate) there will be additional work in GNOME to protect the user session or what I vaguely call the "GUI stack" from reclaim, and thus improve its responsiveness at the expense of the resource hungry process.

[1] first two lines; set -j to RAM in GiB +2 GiB; i.e. if you have 8G RAM, use -j 10; more jobs makes the problem happen faster.
https://trac.webkit.org/wiki/BuildingGtk