nova-compute runaway memory allocation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Committed
|
High
|
Russell Bryant | ||
nova (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
This morning I upgraded my Openstack hosts to the latest packaged version of the nova packages. After the upgrade some of the compute node machines were rather unresponsive.
After investigation it became apparent that nova-compute was nearly exhausting the node's memory. I captured as much data as I dared before having to terminate the process.
The symptoms seem to have gone away after restarting nova-compute again. This occured on three of five compute nodes, but I was unable to reproduce (yet) after restarting nova-compute.
$ ps axfuwww | grep compute
nova 17606 0.0 0.0 48040 2220 ? Ss 13:19 0:00 su -s /bin/sh -c exec nova-compute --flagfile=
nova 17607 4.3 48.6 14240940 12000856 ? Dl 13:19 0:13 \_ /usr/bin/python /usr/bin/
$ sudo strace -p 17607
Process 17607 attached - interrupt to quit
brk(0x22db06000) = 0x22db06000
brk(0x22db27000) = 0x22db27000
brk(0x22db48000) = 0x22db48000
[...]
brk(0x22e7c9000) = 0x22e7c9000
brk(0x22e7ea000) = 0x22e7ea000
brk(0x22e80b000) = 0x22e80b000
^CProcess 17607 detached
/var/log/
[...]
2012-02-22 13:21:12,543 DEBUG nova.compute.
2012-02-22 13:21:12,544 AUDIT nova.compute.
2012-02-22 13:21:12,628 DEBUG nova.rpc.common [-] Making asynchronous call on network ... from (pid=17607) multicall /usr/lib/
2012-02-22 13:21:12,629 DEBUG nova.rpc.common [-] MSG_ID is 8f75c505d29a412
2012-02-22 13:21:12,629 DEBUG nova.rpc.common [-] Pool creating new connection from (pid=17607) create /usr/lib/
2012-02-22 13:21:12,633 DEBUG amqplib [-] Start from server, version: 8.0, properties: {u'information': u'Licensed under the MPL. See http://
2012-02-22 13:21:12,634 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/
2012-02-22 13:21:12,635 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/
2012-02-22 13:21:12,636 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/
2012-02-22 13:21:12,637 DEBUG nova.compute.
2012-02-22 13:21:12,638 DEBUG nova.rpc.common [-] Making asynchronous cast on network... from (pid=17607) cast /usr/lib/
2012-02-22 13:21:12,640 DEBUG amqplib [-] Open OK! known_hosts [] from (pid=17607) _open_ok /usr/lib/
2012-02-22 13:21:12,640 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/
2012-02-22 13:21:12,641 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/
2012-02-22 13:21:12,642 INFO nova.rpc.common [-] Connected to AMQP server on 10.55.58.1:5672
2012-02-22 13:21:12,643 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/
2012-02-22 13:21:12,644 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/
2012-02-22 13:21:12,645 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/
2012-02-22 13:21:12,646 DEBUG nova.compute.
2012-02-22 13:21:14,302 INFO nova.virt.
2012-02-22 13:21:14,302 INFO nova.virt.
2012-02-22 13:21:15,283 INFO nova.virt.
2012-02-22 13:21:15,286 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/
2012-02-22 13:21:15,287 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/
2012-02-22 13:21:15,288 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/
2012-02-22 13:21:15,288 DEBUG nova.compute.
2012-02-22 13:21:15,289 DEBUG nova.rpc.common [-] Making asynchronous cast on network... from (pid=17607) cast /usr/lib/
2012-02-22 13:21:15,290 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/
2012-02-22 13:21:15,291 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/
2012-02-22 13:21:15,292 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/
2012-02-22 13:21:15,292 DEBUG nova.compute.
2012-02-22 13:21:16,987 INFO nova.virt.
2012-02-22 13:21:16,987 INFO nova.virt.
2012-02-22 13:21:17,918 INFO nova.virt.
2012-02-22 13:21:17,918 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/
2012-02-22 13:21:17,919 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/
2012-02-22 13:21:17,920 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/
2012-02-22 13:21:17,921 DEBUG nova.compute.
2012-02-22 13:21:17,922 DEBUG nova.rpc.common [-] Making asynchronous cast on network... from (pid=17607) cast /usr/lib/
2012-02-22 13:21:17,924 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/
2012-02-22 13:21:17,925 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/
2012-02-22 13:21:17,926 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/
2012-02-22 13:21:17,926 DEBUG nova.compute.
2012-02-22 13:21:19,680 INFO nova.virt.
2012-02-22 13:21:19,681 INFO nova.virt.
^^^ The log was not being updated with additional information while nova-compute was running away
$ dpkg-query --show nova-compute
nova-compute 2012.1~
$ cat /etc/lsb-release·
DISTRIB_ID=Ubuntu
DISTRIB_
DISTRIB_
DISTRIB_
Please let me know if I can help any further?
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → High |
Changed in nova: | |
assignee: | nobody → Russell Bryant (russellb) |
status: | Confirmed → Fix Committed |
milestone: | none → essex-4 |
Status changed to 'Confirmed' because the bug affects multiple users.