Comment 34 for bug 1833281

Revision history for this message
In , netwiz (netwiz-linux-kernel-bugs) wrote :

Created attachment 258079
signature.asc

On Thursday, 24 August 2017 10:41:39 PM AEST Michal Hocko wrote:
> On Thu 24-08-17 00:30:40, Steven Haigh wrote:
> > On Wednesday, 23 August 2017 11:38:48 PM AEST Michal Hocko wrote:
> > > On Tue 22-08-17 15:55:30, Andrew Morton wrote:
> > > > (switched to email. Please respond via emailed reply-to-all, not via
> > > > the
> > > > bugzilla web interface).
> > > >
> > > > On Tue, 22 Aug 2017 11:17:08 +0000 <email address hidden>
> >
> > wrote:
> > > [...]
> > >
> > > > Sadly I haven't been able to capture this information
> > > >
> > > > > fully yet due to said unresponsiveness.
> > >
> > > Please try to collect /proc/vmstat in the bacground and provide the
> > > collected data. Something like
> > >
> > > while true
> > > do
> > >
> > > cp /proc/vmstat > vmstat.$(date +%s)
> > > sleep 1s
> > >
> > > done
> > >
> > > If the system turns out so busy that it won't be able to fork a process
> > > or write the output (which you will see by checking timestamps of files
> > > and looking for holes) then you can try the attached proggy
> > > ./read_vmstat output_file timeout output_size
> > >
> > > Note you might need to increase the mlock rlimit to lock everything into
> > > memory.
> >
> > Thanks Michal,
> >
> > I have upgraded PCs since I initially put together this data - however I
> > was able to get strange behaviour by pulling out an 8Gb RAM stick in my
> > new system - leaving it with only 8Gb of RAM.
> >
> > All these tests are performed with Fedora 26 and kernel
> > 4.12.8-300.fc26.x86_64
> >
> > I have attached 3 files with output.
> >
> > 8Gb-noswap.tar.gz contains the output of /proc/vmstat running on 8Gb of
> > RAM
> > with no swap. Under this scenario, I was expecting the OOM reaper to just
> > kill the game when memory allocated became too high for the amount of
> > physical RAM. Interestingly, you'll notice a massive hang in the output
> > before the game is terminated. I didn't see this before.
>
> I have checked few gaps. E.g. vmstat.1503496391 vmstat.1503496451 which
> is one minute. The most notable thing is that there are only very few
> pagecache pages
> [base] [diff]
> nr_active_file 1641 3345
> nr_inactive_file 1630 4787
>
> So there is not much to reclaim without swap. The more important thing
> is that we keep reclaiming and refaulting that memory
>
> workingset_activate 5905591 1616391
> workingset_refault 33412538 10302135
> pgactivate 42279686 13219593
> pgdeactivate 48175757 14833350
>
> pgscan_kswapd 379431778 126407849
> pgsteal_kswapd 49751559 13322930
>
> so we are effectivelly trashing over the very small amount of
> reclaimable memory. This is something that we cannot detect right now.
> It is even questionable whether the OOM killer would be an appropriate
> action. Your system has recovered and then it is always hard to decide
> whether a disruptive action is more appropriate. One minute of
> unresponsiveness is certainly annoying though. Your system is obviously
> under provisioned to load you want to run obviously.
>
> It is quite interesting to see that we do not really have too many
> direct reclaimers during this time period
> allocstall_normal 30 1
> allocstall_movable 490 88
> pgscan_direct_throttle 0 0
> pgsteal_direct 24434 4069
> pgscan_direct 38678 5868

Yes, I understand that the system is really not suitable - however I believe
the test is useful - even from an informational point of view :)

> > 8Gb-swap-on-file.tar.gz contains the output of /proc/vmstat still with 8Gb
> > of RAM - but creating a file with swap on the PCIe SSD /swapfile with
> > size 8Gb>
> > via:
> > # dd if=/dev/zero of=/swapfile bs=1G count=8
> > # mkswap /swapfile
> > # swapon /swapfile
> >
> > Some times (all in UTC+10):
> > 23:58:30 - Start loading the saved game
> > 23:59:38 - Load ok, all running fine
> > 00:00:15 - Load Chrome
> > 00:01:00 - Quit the game
> >
> > The game seemed to run ok with no real issue - and a lot was swapped to
> > the
> > swap file. I'm wondering if it was purely the speed of the PCIe SSD that
> > caused this appearance - as the creation of the file with dd completed at
> > ~1.4GB/sec.
>
> Swap IO tends to be really scattered and the IO performance is not really
> great even on a fast storage AFAIK.
>
> Anyway your original report sounded like a regression. Were you able to
> run the _same_ workload on an older kernel without these issues?

When I try the same tests with swap on an SSD under kernel 4.10.x (I believe
the latest I tried was 4.10.25?) - then swap using the SSD did not cause any
issues or periods of system unresponsiveness.

The file attached in the original bug report "vmstat-4.10.17-10Gb.log" was
taken on my old system with 10Gb of RAM - and there were no significant pauses
while swapping.

I do find it interesting that the newer '8Gb-swap-on-file.tar.gz' does not
show any issues. I wonder if it would be helpful to attempt the same using a
file on the SSD that was a swap disk in the '8Gb-swap-on-ssd.tar.gz' so we
have a constant device - but with a file on the SSD instead of the entire
block device. That would at least expose any issues on the same device in file
vs block mode? Or maybe even if there's a difference just having the file on a
much (much!) faster drive?