Bug #1322302 “local provider is very slow to tranistion from age...” : Bugs : juju-core package : Ubuntu

Revision history for this message

Daniel Westervelt (danwest) wrote on 2014-05-22:

#1

/var/log/juju-ubuntu-local/* files Edit (10.3 KiB, application/x-tar)

Revision history for this message

Daniel Westervelt (danwest) wrote on 2014-05-22:

#2

machnice log for kvm created by juju

ubuntu@ubuntu:/var/log/juju$ pastebinit machine-1.log
http://paste.ubuntu.com/7502559/

Revision history for this message

Daniel Westervelt (danwest) wrote on 2014-05-22:

#3

machine-1.log Edit (4.3 KiB, text/plain)

machine-1.log (log for kvm created by juju - hard copy of pastebin from previous comment)

Nate Finch (natefinch) on 2014-05-22

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Ian Booth (wallyworld) wrote on 2014-05-22:

#4

When Juju uses LXC containers, the very first container is always slow to come up because it needs to download the image. We added clone support to make subsequent containers start much faster.

There are a couple of options to make the initial template download faster.
1. Juju supports an http-proxy environment config option (as well as apt-http-proxy).
https://juju.ubuntu.com/docs/howto-proxies.html
These proxy settings are propagated to the shell which calls lxc-create, which in turn calls wget to fetch the image.
So if you were to use a squid cache of some sort to provide the lxc images, and set up the proxy config, that would be a solution.

2. Have the Orange boxes configured to use a transparent proxy.
I believe Kapil is talking to Dustin about this option.

Having said all that, we also need to improve juju status to show more than just Pending. That's on the todo list.

If you do in fact have a cache all set up and it's still really slow, then we need to look deeper. The status changes from pending to started when the machine comes up and phones home. That can take a little time based on how fast the machine starts.

Can you advise how it behaves once you have a proxy all set up?

Revision history for this message

Daniel Westervelt (danwest) wrote on 2014-05-23:

#5

It is KVM, not LXC and the image download is not the problem as the VM is up and running within one minute. I have even logged into it directly to verify . The problem is that juju sits in an agent-status: pending for up to nine minutes after that.

Revision history for this message

Daniel Westervelt (danwest) wrote on 2014-05-23:

#6

juju-ubuntu-local with debug Edit (2.4 MiB, application/x-tar)

another run, same issue but this time with debug level logging (attached)

Revision history for this message

Daniel Westervelt (danwest) wrote on 2014-05-23:

#7

Also, in order to get rid of the one minute image download overhead I have done runs where I first synced the image locally with:

uvt-simplestreams-libvirt sync release=trusty arch=amd64

this increased the speed to which the KVM started but still had the post KVM start delay of 8-9 minutes.

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2014-05-23:

#8

Thanks for uploading the debug logs.. it looks like a 7m gap from the start of provisioning the kvm instance, till the agent first starts up. Can you attach the /var/log/cloud-init-output.log .. that will help identify whats taking the additional time.

Revision history for this message

Daniel Westervelt (danwest) wrote on 2014-05-23:

#9

http://paste.ubuntu.com/7505203/ <- /var/log/cloud-init.log from the slow KVM

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2014-05-23:

#10

This seems to be causing most of the time loss. Is there a local proxy/mirror ? Is the image out of date?

May 22 20:57:25 ubuntu [CLOUDINIT] util.py[DEBUG]: apt-upgrade [eatmydata apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet dist-upgrade] took 353.061 seconds

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2014-05-23:

#11

confirmed this is not a juju issue. seems to be a nested virt perf or other local setup issue.

Revision history for this message

Alexis Bruemmer (alexis-bruemmer) wrote on 2014-05-23:

#12

If this is not a juju issue can we close/reject the bug?

Revision history for this message

Ian Booth (wallyworld) wrote on 2014-05-25:

#13

Marking as incomplete unless we identify a problem we can fix in juju

Changed in juju-core:
status:	Triaged → Incomplete

Revision history for this message

Daniel Westervelt (danwest) wrote on 2014-05-27:

#14

slowness not attributed to juju but to uvt-tool and cloudinit.

affects:	juju-core → uvtool
Changed in uvtool:
assignee:	nobody → Canonical Server Team (canonical-server)

Revision history for this message

Robie Basak (racb) wrote on 2014-05-27:

#15

I think investigation needs to start from a qemu performance perspective. It may be that uvtool needs to tell libvirt to tell qemu to do something differently, but in that case I need to know what.

affects:	uvtool → qemu
affects:	qemu → qemu (Ubuntu)
Changed in qemu (Ubuntu):
status:	Incomplete → New

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-05-27:

#16

Can someone show the actual dpkg term log to show how much needed upgrading?

Can you confirm that kvm is being run with acceleration?

Since you're using uvtool I assume you're also using a qcow2, which further degrades performance...

Revision history for this message

Adam Stokes (adam-stokes) wrote on 2014-05-28:

#17

Reproducer steps:

Create a trusty virtual machine using the defaults provided with uvt tool.
ssh into that virtual machine and setup juju environments.yaml to be a local provider with a container-type: kvm

juju bootstrap
juju add-machine

Revision history for this message

Adam Stokes (adam-stokes) wrote on 2014-05-28:

#18

environments.yaml:

default: local

environments:
  local:
    type: local
    container: kvm
    lxc-use-clone: true

Revision history for this message

Ryan Harper (raharper) wrote on 2014-05-28:

#19

It takes 164 seconds in cloud init to bring up the first machine.

What's the outer VM config (resources), likewise, what's the host machine resource level?

Revision history for this message

Ryan Harper (raharper) wrote on 2014-05-28:

#20

Download full text (4.2 KiB)

Summary
-------

Add-machine: Wed May 28 14:31:56 UTC 2014 (0)
KVM Ssh'able: Wed May 28 14:33:15 UTC 2014 (+79)
Cloud-init done: Wed May 28 14:35:14 UTC 2014 (+119)
Juju status started: Wed May 28 14:35:38 UTC 2014 (+24)

Total time: 222 seconds: 3 min, 40 seconds.

----------------------------------------
ubuntu@kvmjuju:~$ date; juju add-machine --constraints "cpu-cores=4 root-disk=20G"
Wed May 28 14:31:56 UTC 2014
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:32:02 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    instance-id: pending
    series: trusty
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:32:24 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: pending
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ juju ssh 5 date
ERROR machine "5" has no public address
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:33:15 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: pending
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:34:36 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: pending
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:35:14 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: down
    agent-state-info: (started)
    agent-version: 1.18.3.1
    dns-name: 10.0.3.67
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:35:19 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: down
    agent-state-info: (started)
    agent-version: 1.18.3.1
    dns-name: 10.0.3.67
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:35:21 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: down
    agent-state-info: (start...

Summary
-------

Add-machine:         Wed May 28 14:31:56 UTC 2014 (0)
KVM Ssh'able:        Wed May 28 14:33:15 UTC 2014 (+79)
Cloud-init done:     Wed May 28 14:35:14 UTC 2014 (+119)
Juju status started: Wed May 28 14:35:38 UTC 2014 (+24)

Total time: 222 seconds: 3 min, 40 seconds.

----------------------------------------
ubuntu@kvmjuju:~$ date; juju add-machine --constraints "cpu-cores=4 root-disk=20G"
Wed May 28 14:31:56 UTC 2014
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:32:02 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    instance-id: pending
    series: trusty
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:32:24 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: pending
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ juju ssh 5 date
ERROR machine "5" has no public address
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:33:15 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: pending
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:34:36 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: pending
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:35:14 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: down
    agent-state-info: (started)
    agent-version: 1.18.3.1
    dns-name: 10.0.3.67
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:35:19 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: down
    agent-state-info: (started)
    agent-version: 1.18.3.1
    dns-name: 10.0.3.67
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:35:21 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: down
    agent-state-info: (started)
    agent-version: 1.18.3.1
    dns-name: 10.0.3.67
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:35:23 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: down
    agent-state-info: (started)
    agent-version: 1.18.3.1
    dns-name: 10.0.3.67
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$ date; juju status
Wed May 28 14:35:38 UTC 2014
environment: local
machines:
  "0":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: localhost
    instance-id: localhost
    series: trusty
  "5":
    agent-state: started
    agent-version: 1.18.3.1
    dns-name: 10.0.3.67
    instance-id: ubuntu-local-machine-5
    series: trusty
    hardware: arch=amd64 cpu-cores=4 mem=512M root-disk=20480M
services: {}
ubuntu@kvmjuju:~$

Revision history for this message

Ryan Harper (raharper) wrote on 2014-05-28:

#21

One modification to the outer KVM that may help here:

adding --unsafe-cache to your first level KVM will use host ram to cache guest writes to the qcow2 image -- this means what it means, if kill your VMs or host machine has power failure, you'll lose data. With that in place the same test resulted in add-machine to status: started of: 149 seconds.

Revision history for this message

Ian Booth (wallyworld) wrote on 2014-05-28:

#22

A note from the juju side of the fence - Juju currently just syncs the latest released image when calling uvt-simplestreams-libvirt. It could be configured to add a --source arg to get a daily image if required. But configuration adds complexity so if the released kvm images were updated more often, I guess that would be preferable.

Revision history for this message

Daniel Westervelt (danwest) wrote on 2014-05-29:

#23

looks like part of the problem was that we were running with constraints on cpu greater then the level1 KVM had allocated to it.

base KVM had 1 cpu, juju was asking for three:

juju add-machine --constraints "mem=3G root-disk=20G cpu-cores=3"

if we modify the base KVM to have 3 cpus the time drops from 13.5 miutes to 4.5 minutes.

Adding cache='unsafe' further reduces the time down to 3 minutes.

That said, I still think we need to have an option to tell juju to pull the latest daily images in order to reduce the number of updates applied via cloudinit.

uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily release=precise arch=amd64

^^ This would be the uvtool cmdline to sync the latest daily

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-05-29: Re: [Bug 1322302] Re: local provider is very slow to tranistion from agent-status: pending

#24

So it looks like this should be reclassified against juju again?

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2014-05-29:

#25

using latest dailies images and/or hacking juju to not blindly issue
apt-get update/upgrade is something i often to do safe a minute per unit
(x100s). its also the only way to get kernel updates sans rebooting a
machine post deploy. a daily should get us to the same place as the last
release plus upgrades, however it does not get qa'd by the cloud images
team.

On Thu, May 29, 2014 at 11:33 AM, Serge Hallyn <email address hidden>
wrote:

> So it looks like this should be reclassified against juju again?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1322302
>
> Title:
> local provider is very slow to tranistion from agent-status: pending
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1322302/+subscriptions
>

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-05-29:

#26

Ok, then it sounds like the bug should be either invalid or wontfix - be it against qemu or juju. Probably both, since it is the case for both.

As I've been repeatedly told, nested qemu (especially on intel) is a convenience/developer feature, and all but unsupported upstream. So we either need to take it upon ourselves to move resources to address it ourselves (as much as possible), or accept facts as they are when designing on top of it.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-05-29:

#27

Actually the first two suggestions in comment #23 seem perfectly valid against juju - do not overprovision, at least, and perhaps, conditionally, use cache=unsafe. I would assume that in a developer situation, if power is lost during a juju deploy, all vms would get wiped and the charms re-deployed from scratch anyway.

Revision history for this message

Daniel Westervelt (danwest) wrote on 2014-05-29:

#28

cache=unsafe only helps a litte. we either need to support daily or more frequent images by default, mainly to avoid the cost of kernel updates or support a feature in juju to allow us to configure what images we want to pull.

as a side note, I am happy to get ride of nested kvm if we can get all Openstack services running in containers.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-05-29:

#29

Quoting Daniel Westervelt (<email address hidden>):
> cache=unsafe only helps a litte.

In comment #23 you said it brought the time from 4.5 to 3 minutes, a 33%
improvement.

> we either need to support daily or more
> frequent images by default, mainly to avoid the cost of kernel updates
> or support a feature in juju to allow us to configure what images we
> want to pull.

That still all falls under the purvue of juju. All I am pointing out is
that the bug should be marked affecting juju.

> as a side note, I am happy to get ride of nested kvm if we can get all
> Openstack services running in containers.

One step in this direction would be if we could use mountlo, which is
fuse-based, in a container. I haven't had a chance to test it, but it
is packaged and in theory it should work.

But that is unrelated to the fact that this bug's status against
qemu is either invalid (upstream) or wontfix (ubuntu package).

Patricia Gaughen (gaughen) on 2014-05-29

affects:	qemu (Ubuntu) → juju-core (Ubuntu)
Changed in juju-core (Ubuntu):
assignee:	Canonical Server Team (canonical-server) → nobody

Ian Booth (wallyworld) on 2014-05-30

Changed in juju-core:
importance:	Undecided → High
status:	New → Triaged
milestone:	none → next-stable

Revision history for this message

Ian Booth (wallyworld) wrote on 2014-07-17:

#30

The ability to specify a different image stream to allow daily images to be used query simpleastreams images was added in 1.17.2, but this was for cloud nodes started by the provider directly in the cloud, not images fetched by uvt-simplestreams-libvirt as is done when creating a kvm container.

We can extend the kvm provisioning component of Juju to also honour the "image-stream" configuration setting.

Additionally, there is now also agreement that we are to introduce configuration settings to disable apt update/upgrade steps at bootstrap, reducing provisioning time even further.

Changed in juju-core:
milestone:	next-stable → 1.21-alpha1

Katherine Cox-Buday (cox-katherine-e) on 2014-07-21

Changed in juju-core:
assignee:	nobody → Katherine Cox-Buday (cox-katherine-e)
status:	Triaged → In Progress

Revision history for this message

Katherine Cox-Buday (cox-katherine-e) wrote on 2014-08-13:

#31

The first half of this fix has been landed into trunk. When users specify a stream in the "image-stream" config setting, Juju will use this to determine which simple stream to sync. By specifying "daily", new machines will receive a more up-to-date image thus shortening the time it takes to perform apt-get update/upgrade.

The second half which is designed to make OS update behavior configurable is in the works.

Revision history for this message

Robie Basak (racb) wrote on 2014-08-13:

#32

Note that bug 1317680 may be a potential issue here. If you end up in the situation that you've previously synced the same release from both a daily and a released stream, then an attempt to create a container will be unambiguous and thus fail. In that bug the issue was that the user had manually done it locally. Here, the problem may be that Juju trips itself up by ending up in that situation, if Juju mixes use of released and daily streams.

I suppose a wider issue here is that we don't currently have a good answer inside simplestreams for syncing images from multiple streams into the same pool.

Katherine Cox-Buday (cox-katherine-e) on 2014-08-26

Changed in juju-core:
status:	In Progress → Fix Committed

Revision history for this message

Katherine Cox-Buday (cox-katherine-e) wrote on 2014-08-26:

#33

Thanks for the information, Robie. I am assigned to 1317680 as well, so I'll keep your comments in mind.

Revision history for this message

Katherine Cox-Buday (cox-katherine-e) wrote on 2014-08-26:

#34

Oops, looks like I was mistaken. I'm not assigned. I see you cross-linked your comment, so whomever gets assigned should have enough to go on.

Curtis Hovey (sinzui) on 2014-09-10

Changed in juju-core:
status:	Fix Committed → Fix Released

Curtis Hovey (sinzui) on 2015-09-25

Changed in juju-core (Ubuntu):
status:	New → Fix Released

Ubuntu
juju-core package

local provider is very slow to tranistion from agent-status: pending

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	juju-core	Fix Released	High	Katherine Cox-Buday	juju-core 1.21-alpha1
	juju-core (Ubuntu)	Fix Released	High	Unassigned

Ubuntujuju-core package

local provider is very slow to tranistion from agent-status: pending

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
juju-core package