Comment 29 for bug 1833281

Revision history for this message
In , netwiz (netwiz-linux-kernel-bugs) wrote :

Created attachment 258069
8Gb-noswap.tar.gz

On Wednesday, 23 August 2017 11:38:48 PM AEST Michal Hocko wrote:
> On Tue 22-08-17 15:55:30, Andrew Morton wrote:
> > (switched to email. Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
>
> > On Tue, 22 Aug 2017 11:17:08 +0000 <email address hidden>
wrote:
> [...]
>
> > Sadly I haven't been able to capture this information
> >
> > > fully yet due to said unresponsiveness.
>
> Please try to collect /proc/vmstat in the bacground and provide the
> collected data. Something like
>
> while true
> do
> cp /proc/vmstat > vmstat.$(date +%s)
> sleep 1s
> done
>
> If the system turns out so busy that it won't be able to fork a process
> or write the output (which you will see by checking timestamps of files
> and looking for holes) then you can try the attached proggy
> ./read_vmstat output_file timeout output_size
>
> Note you might need to increase the mlock rlimit to lock everything into
> memory.

Thanks Michal,

I have upgraded PCs since I initially put together this data - however I was
able to get strange behaviour by pulling out an 8Gb RAM stick in my new system
- leaving it with only 8Gb of RAM.

All these tests are performed with Fedora 26 and kernel 4.12.8-300.fc26.x86_64

I have attached 3 files with output.

8Gb-noswap.tar.gz contains the output of /proc/vmstat running on 8Gb of RAM
with no swap. Under this scenario, I was expecting the OOM reaper to just kill
the game when memory allocated became too high for the amount of physical RAM.
Interestingly, you'll notice a massive hang in the output before the game is
terminated. I didn't see this before.

8Gb-swap-on-file.tar.gz contains the output of /proc/vmstat still with 8Gb of
RAM - but creating a file with swap on the PCIe SSD /swapfile with size 8Gb
via:
 # dd if=/dev/zero of=/swapfile bs=1G count=8
 # mkswap /swapfile
 # swapon /swapfile

Some times (all in UTC+10):
23:58:30 - Start loading the saved game
23:59:38 - Load ok, all running fine
00:00:15 - Load Chrome
00:01:00 - Quit the game

The game seemed to run ok with no real issue - and a lot was swapped to the
swap file. I'm wondering if it was purely the speed of the PCIe SSD that
caused this appearance - as the creation of the file with dd completed at
~1.4GB/sec.

8Gb-swap-on-ssd.tar.gz contains adding a 32Gb SATA based SSD to the system and
using the entire block device as swap via:
 # mkswap -f /dev/sda
 # swapon /dev/sda

There are many pauses and unresponsiveness issues while this was loading -
however we eventually got there.

Some timings (all in UTC+10 again):
00:06:33 - Load the saved game
00:11:22 - Saved game loaded - somewhat responsive
00:12:00 - Load Chrome
00:13:07 - Quit the game + chrome

For the sake of information, the following is a speed test on the SSD in
question:
# dd if=/dev/zero of=/dev/sda bs=1M count=8192 conv=fsync
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 44.923 s, 191 MB/s
# dd if=/dev/sda of=/dev/null bs=1M count=8192 conv=fsync
dd: fsync failed for '/dev/null': Invalid argument
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 30.7414 s, 279 MB/s

Running the game on the exact same system with 16Gb of RAM and no swap works
perfectly - even with multitasking - as we never end up filling physical RAM.

As there is some data missing though, should I still attempt to compile + run
the program provided? I'm not quite clear on the mlock rlimit mention - I
haven't really had to debug anything like this before.