Undercloud sometimes OOM kills things in CI

Bug #1536136 reported by Steven Hardy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Invalid
Undecided
Unassigned

Bug Description

See https://review.openstack.org/#/c/260413/7

In this case we ran out of memory, and killed a heat-engine worker, with the predictable result that the CI test failed.

We need a better solution, even slowing things down a little by configuring swap is better than randomly killing stuff by default.

It's probably not what developers want to deal with from the default VM setup either.

Revision history for this message
Steven Hardy (shardy) wrote :

Note I think this is more likely to happen now we increased the CI vCPU count, as most projects use oslo processutils, which defaults to launching a worker process per core, thus using more memory.

Perhaps we need to increase the CI memory allowed a little more

Revision history for this message
Derek Higgins (derekh) wrote :
Download full text (3.4 KiB)

Looking at an undercloud (in CI) while it is deploying the top 4 memory consumers are

21004 heat 20 0 808064 485444 2928 R 75.0 9.6 18:09.16 heat-engine
21005 heat 20 0 617492 294752 2808 S 13.2 5.8 15:08.34 heat-engine
15211 mysql 20 0 1575892 241784 4940 S 6.9 4.8 3:21.39 mysqld
19989 nova 20 0 505228 169188 2560 S 27.3 3.3 3:32.31 nova-api
19988 nova 20 0 506928 169168 2952 S 11.2 3.3 3:37.26 nova-api
20003 nova 20 0 481380 155520 1996 S 0.0 3.1 0:26.20 nova-api
20004 nova 20 0 481244 155508 1996 S 0.0 3.1 0:17.71 nova-api
19949 nova 20 0 453008 128524 4124 R 1.6 2.5 1:02.90 nova-api
22557 nova 20 0 437680 113972 4224 S 0.0 2.2 1:04.07 nova-compute
20082 nova 20 0 391476 100780 2100 S 0.0 2.0 2:52.78 nova-conductor
20083 nova 20 0 391256 100612 2100 S 0.7 2.0 2:53.96 nova-conductor
19902 nova 20 0 402068 98516 4144 S 0.0 1.9 0:13.49 nova-cert
16971 rabbitmq 20 0 1355748 94020 2020 S 2.3 1.9 1:35.68 beam.smp
20241 neutron 20 0 403212 89424 2004 S 0.0 1.8 2:03.29 neutron-server

The 2 heat processes are using 780Mb in total

Should we be expecting heat to need this much RAM?

We have 4 options here
1. investigate if heat memory footprint can be reduced
2. Bump up the RAM in CI a little more (its at 5Gb)
3. Add some swap to the undercloud
 - testing a p...

Read more...

Revision history for this message
Steven Hardy (shardy) wrote :

We fixed this by adding more ram and tuning the deployed services

Changed in tripleo:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.