Very high CPU usage from 'creating cloud image metadata storage' emitted in debug log every second

Bug #1599779 reported by Andrew McDermott
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Ian Booth

Bug Description

Bootstrapping on Google I see the following output from the debug-log repeated many many times resulting in high CPU usage against jujud and mongod. This does not happen in beta10, so something since then.

started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
starting standard state workers
creating cloud image metadata storage
started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
starting standard state workers
creating cloud image metadata storage
started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
starting standard state workers
creating cloud image metadata storage
started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
starting standard state workers
creating cloud image metadata storage
started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
starting standard state workers
creating cloud image metadata storage
started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
starting standard state workers
creating cloud image metadata storage
started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
starting standard state workers
creating cloud image metadata storage
started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
starting standard state workers

This repeated block of output is happening every few seconds; see the attached machine-0.log.

I tried to bisect this a little:

$ git bisect good
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
f44f3ccfc8f2aad18273ac2f4b5ee83afc7414d4
1db368791b4341a0482c57747935bb44642ff067
6a4a21838f35969a93c7fa8e4c5035186f833705
0a5bd61f6c85d9caf9ead484fc88d6dc525c4635
9ce1f5a9ec85b9750c9911b5a7cc42f2cc19fc4b
2e49459b30930bbceacdd09b4e5473b4f81d654b
535c3f0ac268d4b9c2d84126b57cef7df0f63b33
a392876040e8323c9fca9f53a0586cfcf0d4750e
We cannot bisect more!

Revision history for this message
Andrew McDermott (frobware) wrote :

ubuntu@juju-1f301f-0:~$ top

top - 09:58:09 up 3 min, 1 user, load average: 2.15, 0.95, 0.37
Tasks: 118 total, 2 running, 116 sleeping, 0 stopped, 0 zombie
%Cpu(s): 80.7 us, 13.0 sy, 0.0 ni, 0.0 id, 2.0 wa, 0.0 hi, 4.3 si, 0.0 st
KiB Mem : 1737260 total, 176804 free, 298424 used, 1262032 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 1261332 avail Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 4050 root 20 0 485480 81340 45320 R 63.1 4.7 0:58.76 jujud
 3904 root 20 0 843428 205696 27580 S 33.6 11.8 0:31.89 mongod
  574 root 20 0 0 0 0 S 0.3 0.0 0:00.49 jbd2/sda1-8
  636 root 0 -20 0 0 0 S 0.3 0.0 0:00.26 kworker/0:1H
    1 root 20 0 37736 5804 3996 S 0.0 0.3 0:02.54 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
    3 root 20 0 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/0
    4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0
    5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
    6 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/u2:0
    7 root 20 0 0 0 0 S 0.0 0.0 0:00.04 rcu_sched
    8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
    9 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
   10 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
   11 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs

Revision history for this message
Andrew McDermott (frobware) wrote :
Changed in juju-core:
importance: Undecided → Critical
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

I hope the CPU usage is related to bug #1594924. Doesn't help with the noisy log though.

Revision history for this message
Andrew McDermott (frobware) wrote :

Bumping this issue because I find everything now takes a long time to complete (e.g., juju add-machine lxd:0).

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Are you simply bootstrapping to GCE? Any other machines / containers / services?

Revision history for this message
Andrew McDermott (frobware) wrote : Re: [Bug 1599779] Re: Very high CPU usage from 'creating cloud image metadata storage' emitted in debug log every second

Just bootstrapping. Machine is quiescent in terms of doing anything real
with Juju. Adding a container takes a long time given the CPU usage. As do
various other Juju operations (status, show-machines, etc).

On 12 July 2016 at 13:46, James Tunnicliffe <email address hidden>
wrote:

> Are you simply bootstrapping to GCE? Any other machines / containers /
> services?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1599779
>
> Title:
> Very high CPU usage from 'creating cloud image metadata storage'
> emitted in debug log every second
>
> Status in juju-core:
> New
>
> Bug description:
> Bootstrapping on Google I see the following output from the debug-log
> repeated many many times resulting in high CPU usage against jujud and
> mongod. This does not happen in beta10, so something since then.
>
> started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
> starting standard state workers
> creating cloud image metadata storage
> started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
> starting standard state workers
> creating cloud image metadata storage
> started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
> starting standard state workers
> creating cloud image metadata storage
> started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
> starting standard state workers
> creating cloud image metadata storage
> started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
> starting standard state workers
> creating cloud image metadata storage
> started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
> starting standard state workers
> creating cloud image metadata storage
> started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
> starting standard state workers
> creating cloud image metadata storage
> started state for model-3bdf9f1a-f7e9-4486-840c-d82d3846e3a6 successfully
> starting standard state workers
>
> This repeated block of output is happening every few seconds; see the
> attached machine-0.log.
>
> I tried to bisect this a little:
>
> $ git bisect good
> There are only 'skip'ped commits left to test.
> The first bad commit could be any of:
> f44f3ccfc8f2aad18273ac2f4b5ee83afc7414d4
> 1db368791b4341a0482c57747935bb44642ff067
> 6a4a21838f35969a93c7fa8e4c5035186f833705
> 0a5bd61f6c85d9caf9ead484fc88d6dc525c4635
> 9ce1f5a9ec85b9750c9911b5a7cc42f2cc19fc4b
> 2e49459b30930bbceacdd09b4e5473b4f81d654b
> 535c3f0ac268d4b9c2d84126b57cef7df0f63b33
> a392876040e8323c9fca9f53a0586cfcf0d4750e
> We cannot bisect more!
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1599779/+subscriptions
>

--
Andrew McDermott <email address hidden>
Juju Core Sapphire team <http://juju.ubuntu.com>

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

This would be it:

620d320e2ffd46078355ce95de297d26bb506c22 "Opening a State on behalf of a non-anonymous user will now cause a leadership manager to run (and be cleaned up when necessary)"

Looks like something is opening a state connection, trying to do something, failing, trying again...

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Well, confirmed but since the environment wouldn't bootstrap and I killed the bootstrap process so I could look at the logs, I now have a machine I can't kill. Yay.

My guess is that something like log forwarding opens state to log something, logs it, closes the connection (though it could be anything that is performing lots of state open/close calls). The fact that Mongo is sitting there being hammered points at something causing write amplification. Fun.

Changed in juju-core:
status: New → Triaged
Revision history for this message
James Tunnicliffe (dooferlad) wrote :
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

// SetLastSent is a bulk call that sets the log forwarding "last sent"
// record ID for each requested target.
func (api *LogForwardingAPI) SetLastSent(args params.LogForwardingSetLastSentParams) params.ErrorResults {
 results := make([]params.ErrorResult, len(args.Params), len(args.Params))
 for i, arg := range args.Params {
  results[i].Error = api.set(arg)
 }
 return params.ErrorResults{
  Results: results,
 }
}

calls

func (api *LogForwardingAPI) set(arg params.LogForwardingSetLastSentParam) *params.Error {
 lst, err := api.newLastSentTracker(arg.LogForwardingID)
 if err != nil {
  return common.ServerError(err)
 }
 defer lst.Close()

 err = lst.Set(arg.RecordID)
 return common.ServerError(err)
}

This creates a new API connection instead of reusing an old one. I assume recycling is a good thing here.

Changed in juju-core:
milestone: none → 2.0-beta12
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

I'm attaching a set of TRACE level logs from bootstrap onwards for a deployment exhibiting this problem. It's not completely clear but there seems like there's an issue around the new log forwarding feature.

Ian Booth (wallyworld)
Changed in juju-core:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Revision history for this message
Ian Booth (wallyworld) wrote :
Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta12 → none
milestone: none → 2.0-beta12
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.