possible leak in upstart 1.5

Bug #1198180 reported by Dawid Stawiarski
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
upstart (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Description: Ubuntu 12.04.2 LTS
Release: 12.04
Package: 1.5-0ubuntu7.2
Kernel: Linux XXX 3.8.0-23-generic #34~precise1-Ubuntu SMP Wed May 29 21:12:31 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

We're starting lxc containers on demand, so there's lots and lots of network interfaces going up and down. Udev emits signal, that goes to upstart-udev-bridge, and next to the upstart itself. After about 40K containers created/destroyed, upstart takes about 250MB or RAM, and it grows with every event received/send. Not only RAM is beeing used, but also init spends more time for every new event on the socket.

What happens:
init uses more RAM and more CPU during lifetime, when lots of network interfaces are beeing created/destroyed

What is expected:
init work should not change depending on number of events served

technical detail:
# pmap -x 1
00007f446f865000 0 237036 237036 rw--- [ anon ]
00007fffc694d000 0 24 24 rw--- [ stack ]
00007fffc69fe000 0 4 0 r-x-- [ anon ]
ffffffffff600000 0 0 0 r-x-- [ anon ]

Revision history for this message
James Hunt (jamesodhunt) wrote :

Approximately how many running jobs do you have after you've created 40k containers? If you stop all those network interfaces, is the memory reclaimed?

Please provide further details so we can investigate more fully.

Revision history for this message
Dawid Stawiarski (neeo) wrote :

To clarify: we have about 200-300 containers running simultaneously, and the average lifetime of a single container is 2-10 mins; the network interfaces are also destroyed when the container goes down. So, in a specific point in time, we have about 200-300 network interfaces, and they're created and destroyed dynamicly (with lxc-start and lxc-stop).

strace'ing only mmap and unmap for init (-p 1) you can see, that memory is mapped for EVERY event, but only parts of it are unmapped:

[pid 3058] 08:43:07.444810 mmap(NULL, 677, PROT_READ, MAP_SHARED, 11, 0) = 0x7ffe40a7e000
[pid 3058] 08:43:07.444942 munmap(0x7ffe40a7e000, 677) = 0
[pid 3058] 08:43:07.446329 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f88f1b38000
[pid 3058] 08:43:07.446604 mmap(NULL, 26711, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f88f1b31000
[pid 3058] 08:43:07.446806 mmap(NULL, 3925208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f88f1559000
[pid 3058] 08:43:07.446856 mmap(0x7f88f190d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b4000) = 0x7f88f190d000
[pid 3058] 08:43:07.446965 mmap(0x7f88f1913000, 17624, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f88f1913000
[pid 3058] 08:43:07.447137 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f88f1b30000
[pid 3058] 08:43:07.447200 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f88f1b2f000
[pid 3058] 08:43:07.447250 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f88f1b2e000
[pid 3058] 08:43:07.447490 munmap(0x7f88f1b31000, 26711) = 0

We do not have any custom upstart jobs, so only /etc/init/network-interface*.conf are run. We have also machines, where init uses more resources (after creating/destroying about 200K containers) :

    1 root 20 0 978m 951m 1360 S 77 0.7 3410:06 /sbin/init

it uses more, and more with every event served - and no, the memory is NOT freed after shutting down every container. Also, after some time, init uses almost one core (77% above), and it gets sluggish -i takes more time to serve new events. The only way for us right now, is to reboot the machine.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in upstart (Ubuntu):
status: New → Confirmed
Revision history for this message
Paweł Hikiert (nsilent22) wrote :
Download full text (7.0 KiB)

I confirm this behaviour.
I use lxc for testing purposes. For each set of tests a few containers are created. Lifetime span is about a few seconds, and the tests are beeing run one after another continuously. After a few days init takes more than 2GB of memory. What's interesting - after stopping the tests two days ago, upstart is still creating/destroying non-existing virtual interfaces for the containers:
$ sudo lxc-list
RUNNING

FROZEN

STOPPED
$

$ sudo strace -ff -p 1 2>&1 | grep execve | grep if
[pid 7511] execve("/bin/sh", ["/bin/sh", "-e", "-c", "exec ifdown --allow auto $INTERF"..., "/bin/sh"], [/* 16 vars */] <unfinished ...>
[pid 7511] execve("/sbin/ifdown", ["ifdown", "--allow", "auto", "vetha9Ks2v"], [/* 18 vars */]) = 0
[pid 7513] execve("/bin/sh", ["/bin/sh", "-e", "-c", "exec ifdown --allow auto $INTERF"..., "/bin/sh"], [/* 16 vars */] <unfinished ...>
[pid 7513] execve("/sbin/ifdown", ["ifdown", "--allow", "auto", "vethDDYJtI"], [/* 18 vars */]) = 0
[pid 7514] execve("/bin/sh", ["/bin/sh", "-e", "-c", "exec ifdown --allow auto $INTERF"..., "/bin/sh"], [/* 16 vars */] <unfinished ...>
[pid 7514] execve("/sbin/ifdown", ["ifdown", "--allow", "auto", "vetheaOt3H"], [/* 18 vars */]) = 0
[pid 7523] execve("/sbin/ifup", ["ifup", "--allow", "auto", "vethcqVeRj"], [/* 17 vars */] <unfinished ...>
[pid 7522] execve("/sbin/ifup", ["ifup", "--allow", "auto", "veth0m5onm"], [/* 17 vars */]) = 0
[pid 7530] execve("/sbin/ifup", ["ifup", "--allow", "auto", "vethh6VSTB"], [/* 17 vars */] <unfinished ...>
[pid 7529] execve("/sbin/ifup", ["ifup", "--allow", "auto", "vethfgF5ty"], [/* 17 vars */] <unfinished ...>
[pid 7538] execve("/bin/sh", ["/bin/sh", "-e", "-c", "exec ifdown --allow auto $INTERF"..., "/bin/sh"], [/* 16 vars */] <unfinished ...>
[pid 7538] execve("/sbin/ifdown", ["ifdown", "--allow", "auto", "vethcqVeRj"], [/* 18 vars */]) = 0
[pid 7539] execve("/bin/sh", ["/bin/sh", "-e", "-c", "exec ifdown --allow auto $INTERF"..., "/bin/sh"], [/* 16 vars */] <unfinished ...>
[pid 7539] execve("/sbin/ifdown", ["ifdown", "--allow", "auto", "vethfgF5ty"], [/* 18 vars */] <unfinished ...>
[pid 7537] execve("/sbin/ifup", ["ifup", "--allow", "auto", "vethkx0wFy"], [/* 17 vars */] <unfinished ...>
[pid 7536] execve("/sbin/ifup", ["ifup", "--allow", "auto", "veth9z1M5B"], [/* 17 vars */]) = 0
[pid 7542] execve("/bin/sh", ["/bin/sh", "-e", "-c", "exec ifdown --allow auto $INTERF"..., "/bin/sh"], [/* 16 vars */] <unfinished ...>
[pid 7542] execve("/sbin/ifdown", ["ifdown", "--allow", "auto", "vethkx0wFy"], [/* 18 vars */]) = 0
[pid 7548] execve("/sbin/ifup", ["ifup", "--allow", "auto", "vethgClCAf"], [/* 17 vars */] <unfinished ...>
[pid 7547] execve("/sbin/ifup", ["ifup", "--allow", "auto", "veth8baiZ7"], [/* 17 vars */]) = 0
[pid 7560] execve("/bin/sh", ["/bin/sh", "-e", "-c", "exec ifdown --allow auto $INTERF"..., "/bin/sh"], [/* 16 vars */] <unfinished ...>
[pid 7560] execve("/sbin/ifdown", ["ifdown", "--allow", "auto", "veth8baiZ7"], [/* 18 vars */]) = 0
[pid 7558] execve("/sbin/ifup", ["ifup", "--allow", "auto", "vethTcUttE"], [/* 17 vars */] <unfinished ...>

$ sudo pmap -x 1
1: /sbin/init
Address Kbytes ...

Read more...

Revision history for this message
Paweł Hikiert (nsilent22) wrote :
Download full text (3.4 KiB)

Update: I've found another memory-hog, the upstart-udev-bridge. Looks like stopping it (sudo stop upstart-udev-bridge) also stops memory leaking in the upstart (init) process.

$ pmap -x 368
368: upstart-udev-bridge --daemon
Address Kbytes RSS Dirty Mode Mapping
---------------- ------ ------ ------
total kB 0 0 0
coding@wbec170:~$ sudo pmap -x 368
368: upstart-udev-bridge --daemon
Address Kbytes RSS Dirty Mode Mapping
00007f66c06af000 0 6988 6988 rw--- [ anon ]
00007f66c82b6000 0 260 260 rw--- [ anon ]
00007f66c9ede000 0 4 0 r-x-- librt-2.15.so
00007f66c9ee5000 0 0 0 ----- librt-2.15.so
00007f66ca0e4000 0 4 4 r---- librt-2.15.so
00007f66ca0e5000 0 0 0 rw--- librt-2.15.so
00007f66ca0e6000 0 228 0 r-x-- libc-2.15.so
00007f66ca29b000 0 0 0 ----- libc-2.15.so
00007f66ca49a000 0 8 8 r---- libc-2.15.so
00007f66ca49e000 0 8 8 rw--- libc-2.15.so
00007f66ca4a0000 0 12 12 rw--- [ anon ]
00007f66ca4a5000 0 4 0 r-x-- libudev.so.0.13.0
00007f66ca4b1000 0 0 0 ----- libudev.so.0.13.0
00007f66ca6b0000 0 4 4 r---- libudev.so.0.13.0
00007f66ca6b1000 0 0 0 rw--- libudev.so.0.13.0
00007f66ca6b2000 0 12 0 r-x-- libpthread-2.15.so
00007f66ca6ca000 0 0 0 ----- libpthread-2.15.so
00007f66ca8c9000 0 4 4 r---- libpthread-2.15.so
00007f66ca8ca000 0 0 0 rw--- libpthread-2.15.so
00007f66ca8cb000 0 4 4 rw--- [ anon ]
00007f66ca8cf000 0 144 0 r-x-- libdbus-1.so.3.5.8
00007f66ca911000 0 0 0 ----- libdbus-1.so.3.5.8
00007f66cab11000 0 4 4 r---- libdbus-1.so.3.5.8
00007f66cab12000 0 4 4 rw--- libdbus-1.so.3.5.8
00007f66cab13000 0 4 0 r-x-- libnih-dbus.so.1.0.0
00007f66cab1b000 0 0 0 ----- libnih-dbus.so.1.0.0
00007f66cad1b000 0 4 4 r---- libnih-dbus.so.1.0.0
00007f66cad1c000 0 0 0 rw--- libnih-dbus.so.1.0.0
00007f66cad1d000 0 8 0 r-x-- libnih.so.1.0.0
00007f66cad33000 0 0 0 ----- libnih.so.1.0.0
00007f66caf33000 0 4 4 r---- libnih.so.1.0.0
00007f66caf34000 0 4 4 rw--- libnih.so.1.0.0
00007f66caf35000 0 0 0 r-x-- ld-2.15.so
00007f66cafaf000 0 48 48 rw--- [ anon ]
00007f66cb133000 0 4 4 rw--- [ anon ]
00007f66cb155000 0 0 0 rw--- [ anon ]
00007f66cb157000 0 0 0 r---- ld-2.15.so
00007f66cb158000 0 0 0 rw--- ld-2.15.so
00007f66cb15a000 0 0 0 r-x-- upstart-udev-bridge
00007f66cb365000 0 4 4 r---- upstart-udev-bridge
00007f66cb366000 0 4 4 rw--- upstart-udev-bridge
00007f66cb750000 0 124 124 rw--- [ anon ]
00007f66cb771000 0 84997...

Read more...

Steve Langasek (vorlon)
Changed in upstart (Ubuntu):
importance: Undecided → Medium
Revision history for this message
James Hunt (jamesodhunt) wrote :

This almost sounds like bug 1235649, but unlikely given that the upstart-udev-bridge uses the correct NIH D-Bus calls.

I suspect the reason for the memory growth is that the events that are being created cannot be destroyed until some other job has finished with them. Once that happens, memory will be reclaimed automatically by Upstart.

Please could those affected attach the output of 'initctl list' (compressed if necessary).

Revision history for this message
Andy Whitcroft (apw) wrote :

As a control I have been hammering a trusty system with a continuious stream of synthetic memory events and see no change in the RSS for upstart-udev-bridge. There is clearly some subtlety here. In some cases it seems clear there is a large and valid queue of pending work, which may account for the memory use.

Revision history for this message
uj__ (unclejacksons) wrote :

The init process is certainly leaking memory. This can be observed by starting a lot of Docker containers which exit right away. I'm able to get upstart's init process to 50-100 MB of memory usage by running 5000-20000 Docker containers which only print something and then exit.

I've looked at the code and it seems like there may be some leaks. The upstart tests are are also leaking memory. Some of that memory is leaked in the tests themselves (allocated and never freed) and some others seem to be leaking in libnih itself.

Ubuntu 14.04 doesn't seem to have this problem at all.

Revision history for this message
James Hunt (jamesodhunt) wrote :

@uj__:

> The init process is certainly leaking memory.
How have you proved that init is leaking memory? Can you say that PID 1 is not just still busy processing the flurry of events that would result from starting all these docker containers? Please provide further details.

> I've looked at the code and it seems like there may be some leaks.
Where?

> The upstart tests are are also leaking memory. Some of that memory is leaked in the tests themselves (allocated and never freed)
The tests may well leak memory as they are not designed to be long-running like upstart.

> and some others seem to be leaking in libnih itself.
Again, where? Please provide logs. Note that libnih goes to some lengths to avoid leaks by doing automatic garbage collection. If you can produce a test program demonstrating a leak, please attach to this bug so we can investigate further.

Revision history for this message
uj__ (unclejacksons) wrote :

@jamesodhunt: Please let me know what information I should provide (the output of specific commands and content of specific logs).

I don't think the init is supposed to use 50 MB of RAM or more forever after running 5000 containers or less - even after CPU goes down to 1-2% for the init process. The fact that CPU usage has gone down is an indicator that events have probably been processed and everything should be OK now, but memory usage is still high.

Memory usage grows by a few bytes even when running a few containers. I can see memory usage grow by a few bytes after running every Docker container. That means it's probably using that much memory

Revision history for this message
uj__ (unclejacksons) wrote :
Download full text (3.9 KiB)

 pmap -x 1
1: /sbin/init
Address Kbytes RSS Dirty Mode Mapping
00007f9799767000 0 16 0 r-x-- libnss_files-2.15.so
00007f9799773000 0 0 0 ----- libnss_files-2.15.so
00007f9799972000 0 4 4 r---- libnss_files-2.15.so
00007f9799973000 0 4 4 rw--- libnss_files-2.15.so
00007f9799974000 0 20 0 r-x-- libnss_nis-2.15.so
00007f979997e000 0 0 0 ----- libnss_nis-2.15.so
00007f9799b7e000 0 4 4 r---- libnss_nis-2.15.so
00007f9799b7f000 0 4 4 rw--- libnss_nis-2.15.so
00007f9799b80000 0 24 0 r-x-- libnsl-2.15.so
00007f9799b97000 0 0 0 ----- libnsl-2.15.so
00007f9799d96000 0 4 4 r---- libnsl-2.15.so
00007f9799d97000 0 4 4 rw--- libnsl-2.15.so
00007f9799d98000 0 0 0 rw--- [ anon ]
00007f9799d9a000 0 24 0 r-x-- libnss_compat-2.15.so
00007f9799da2000 0 0 0 ----- libnss_compat-2.15.so
00007f9799fa1000 0 4 4 r---- libnss_compat-2.15.so
00007f9799fa2000 0 4 4 rw--- libnss_compat-2.15.so
00007f9799fa3000 0 620 0 r-x-- libc-2.15.so
00007f979a158000 0 0 0 ----- libc-2.15.so
00007f979a357000 0 16 16 r---- libc-2.15.so
00007f979a35b000 0 8 8 rw--- libc-2.15.so
00007f979a35d000 0 20 20 rw--- [ anon ]
00007f979a362000 0 20 0 r-x-- librt-2.15.so
00007f979a369000 0 0 0 ----- librt-2.15.so
00007f979a568000 0 4 4 r---- librt-2.15.so
00007f979a569000 0 4 4 rw--- librt-2.15.so
00007f979a56a000 0 48 0 r-x-- libpthread-2.15.so
00007f979a582000 0 0 0 ----- libpthread-2.15.so
00007f979a781000 0 4 4 r---- libpthread-2.15.so
00007f979a782000 0 4 4 rw--- libpthread-2.15.so
00007f979a783000 0 4 4 rw--- [ anon ]
00007f979a787000 0 212 0 r-x-- libdbus-1.so.3.5.8
00007f979a7c9000 0 0 0 ----- libdbus-1.so.3.5.8
00007f979a9c9000 0 4 4 r---- libdbus-1.so.3.5.8
00007f979a9ca000 0 4 4 rw--- libdbus-1.so.3.5.8
00007f979a9cb000 0 32 0 r-x-- libnih-dbus.so.1.0.0
00007f979a9d3000 0 0 0 ----- libnih-dbus.so.1.0.0
00007f979abd3000 0 4 4 r---- libnih-dbus.so.1.0.0
00007f979abd4000 0 4 4 rw--- libnih-dbus.so.1.0.0
00007f979abd5000 0 76 0 r-x-- libnih.so.1.0.0
00007f979abeb000 0 0 0 ----- libnih.so.1.0.0
00007f979adeb000 0 4 4 r---- libnih.so.1.0.0
00007f979adec000 0 4 4 rw--- libnih.so.1.0.0
00007f979aded000 0 112 0 r-x-- ld-2.15.so
00007f979b003000 0 20 20 rw--- [ anon ]
00007f979b00d000 0 8 8 rw--- [ anon ]
00007f979b00f000 0 4 4 r---- ld-2.15.so
00007f979b010000 0 8 8 rw--- ld-2.15.so
00007f979...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.