vgcreate/lvcreate in volume/service.py fail and go undetected

Bug #620027 reported by Armando Migliaccio
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

I noticed that service.py under nova/volume contains this flag:

flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use for volumes')

my host does not have a /dev/sdb, so vgcreate in _init_volume_group(self) fails and consequently lvcreate in _create_lv fails too. It seems that no exeption is reported in the log file. Shouldn't storage_dev be /dev/loop0 or any other free loop device chosen by losetup passed by $NOVA_VOLUME_ARGS?

Thanks,
Armando

description: updated
description: updated
Revision history for this message
Vish Ishaya (vishvananda) wrote : Re: [Bug 620027] Re: vgcreate/lvcreate in volume/service.py fail and go undetected

You can pass in a flag for a different storage device -->
./nova-volume --nodaemon --verbose --storage_dev=/dev/loop0
or in /etc/nova/nova-volume.conf:
--storage_dev=/dev/loop0

Also, if you just make sure that the volume group already exists, it will
work. The group should be called 'nova-volumes'. You can also specify a
different volume group name with a flag
--volume_group=vgfoo

Vish

On Wed, Aug 18, 2010 at 11:46 AM, Armando Migliaccio <
<email address hidden>> wrote:

> ** Description changed:
>
> I noticed that service.py under nova/volume contains this flag:
>
> flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use
> for volumes')
>
> my host does not have a /dev/sdb, so vgcreate in
> _init_volume_group(self) fails and consequently lvcreate in _create_lv
> fails too. It seems that no exeption is reported in the log file.
> - Shouldn't storage_dev be /dev/loop0?
> + Shouldn't storage_dev be /dev/loop0 or any other free loop device chosen
> + by losetup?
>
> Thanks,
> Armando
>
> ** Description changed:
>
> I noticed that service.py under nova/volume contains this flag:
>
> flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use
> for volumes')
>
> my host does not have a /dev/sdb, so vgcreate in
> _init_volume_group(self) fails and consequently lvcreate in _create_lv
> fails too. It seems that no exeption is reported in the log file.
> Shouldn't storage_dev be /dev/loop0 or any other free loop device chosen
> - by losetup?
> + by losetup passed by $NOVA_VOLUME_ARGS?
>
> Thanks,
> Armando
>
> --
> vgcreate/lvcreate in volume/service.py fail and go undetected
> https://bugs.launchpad.net/bugs/620027
> You received this bug notification because you are a member of Nova
> Bugs, which is subscribed to OpenStack Compute (nova).
>
> Status in OpenStack Compute (Nova): New
>
> Bug description:
> I noticed that service.py under nova/volume contains this flag:
>
> flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use for
> volumes')
>
> my host does not have a /dev/sdb, so vgcreate in _init_volume_group(self)
> fails and consequently lvcreate in _create_lv fails too. It seems that no
> exeption is reported in the log file. Shouldn't storage_dev be /dev/loop0 or
> any other free loop device chosen by losetup passed by $NOVA_VOLUME_ARGS?
>
> Thanks,
> Armando
>
>
>
>
>

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :
Download full text (7.3 KiB)

you are right, if you make sure that the volume group already exists, the volume creation does work. However, I went back to my config and saw that I do pass the flag storage_dev=/dev/loop0 on the command line. I also noticed that I do not pass the --nodaemon switch.

When I launch nova-volume without --nodaemon 'vgcreate' does not seem to get called. Is that possible? Instead when I launch nova-volume with the --nodaemon, vgcreate does get called. I did instrument the code and compared the log output in the two cases (latest lines of the logs are interesting):

**** NOVA-VOLUME WITHOUT --NODAEMON SWITCH ****
Starting Nova Volume
DEBUG:root:Full set of FLAGS:
DEBUG:root:help : None
DEBUG:root:storage_availability_zone : nova
DEBUG:root:volume_topic : volume
DEBUG:root:verbose : True
DEBUG:root:encrypted : None
DEBUG:root:compute_topic : compute
DEBUG:root:default_kernel : aki-11111
DEBUG:root:report_profile : None
DEBUG:root:rabbit_password : guest
DEBUG:root:syslog : None
DEBUG:root:prefix : nova-volume
DEBUG:root:vpn_key_suffix : -key
DEBUG:root:ec2_url : http://localhost:8773/services/Cloud
DEBUG:root:originalname : None
DEBUG:root:rundir : .
DEBUG:root:profiler : hotshot
DEBUG:root:uid : None
DEBUG:root:connection_type : libvirt
DEBUG:root:fake_rabbit : False
DEBUG:root:s3_port : 3333
DEBUG:root:help_reactors : None
DEBUG:root:rabbit_host : 10.70.177.14
DEBUG:root:source : None
DEBUG:root:process_pool_size : 4
DEBUG:root:umask : None
DEBUG:root:nothotshot : None
DEBUG:root:debug : False
DEBUG:root:fake_storage : False
DEBUG:root:redis_db : 0
DEBUG:root:gid : None
DEBUG:root:volume_group : nova-volumes
DEBUG:root:reactor : None
DEBUG:root:pidfile : /home/openstack/openstack/nova-volume.pid
DEBUG:root:savestats : None
DEBUG:root:rabbit_userid : guest
DEBUG:root:storage_dev : /dev/loop0
DEBUG:root:file : twistd.tap
DEBUG:root:default_instance_type : m1.small
DEBUG:root:report_interval : 10
DEBUG:root:blades_per_shelf : 16
DEBUG:root:node_availability_zone : nova
DEBUG:root:version : None
DEBUG:root:aoe_eth_dev : eth0
DEBUG:root:auth_token_ttl : 3600
DEBUG:root:rabbit_port : 5672
DEBUG:root:chroot : None
DEBUG:root:profile : None
DEBUG:root:euid : None
DEBUG:root:vpn_image_id : ami-CLOUDPIPE
DEBUG:root:logfile : nova-volume.log
DEBUG:root:nodaemon : None
DEBUG:root:b : None
DEBUG:root:last_shelf_id : 149
DEBUG:root:no_save : True
DEBUG:root:aoe_export_dir : /var/lib/vblade-persist/vblades
DEBUG:root:rabbit_virtual_host : /
DEBUG:root:node_name : phantom
DEBUG:root:redis_host : 127.0.0.1
DEBUG:root:spew : None
DEBUG:root:r : None
DEBUG:root:default_image : ami-11111
DEBUG:root:control_exchange : nova
DEBUG:root:default_ramdisk : ari-11111
DEBUG:root:redis_port : 6379
DEBUG:root:s3_host : 127.0.0.1
DEBUG:root:python : /home/openstack/openstack/nova/trunk/bin/nova-volume
DEBUG:root:first_shelf_id : 140
DEBUG:root:fake_network : False
DEBUG:root:network_topic : network
WARNING:root:Starting volume node
DEBUG:root:*** before_pvcreate ***
DEBUG:root:Executing: sudo ['pvcreate', '/dev/loop0']:
DEBUG:root:>> execute
DEBUG:root:<< execute
DEBUG:root:exe output: <Deferred at 0xa700dec current result: <Deferred at 0xa700e2c>>

The last few lines are l...

Read more...

Revision history for this message
justinsb (justin-fathomdb) wrote :

pvcreate is immediately followed by vgcreate in the code (though both are deferreds). So if pvcreate is being called but vgcreate is not, then it sounds like pvcreate is raising an exception.

I would feel that any errors should be logged, and indeed, looking at the code, I don't see how an error is not being logged. Does anything get printed on stdout/stderr by nova-volume (particularly in the --nodaemon case?)

You might try merging my branch which checks the results of spawned processes:
bzr merge lp:~justin-fathomdb/nova/check-subprocess-exit-code

I don't think that will give you a better error message (though it might!), but what it will do is not treat messages on stderr as being failures. For instance (speculating), perhaps the first time you call pvcreate it loads a kernel module, which prints a message on stderr, which causes a failure the first time you run it (only).

Unfortunately, it looks like our twisted module calls into twistd.runApp, which appears to be an undocumented twisted function (http://twistedmatrix.com/trac/wiki/UndocumentedScripts). Any Twisted people able to comment on why no error is being logged when exceptions are thrown at startup?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

in the --nodaemon case nova-volume works like a charm, it creates both the physical volume and the volume group. It's the case without the --nodaemon switch that has troubles...the difference between the two pvcreate outputs:

*** WITH --nodaemon switch ***

  --- Physical volume ---
  PV Name /dev/loop0
  VG Name nova-volumes
  PV Size 10.00 GiB / not usable 4.00 MiB
  Allocatable yes
  PE Size 4.00 MiB
  Total PE 2559
  Free PE 2559
  Allocated PE 0
  PV UUID 5AKxZf-9TlO-3CSF-8UoG-GHzF-v910-CqIsEg

*** WITHOUT --nodaemon switch ***
  "/dev/loop0" is a new physical volume of "10.00 GiB"
  --- NEW Physical volume ---
  PV Name /dev/loop0
  VG Name
  PV Size 10.00 GiB
  Allocatable NO
  PE Size 0
  Total PE 0
  Free PE 0
  Allocated PE 0
  PV UUID FWH78L-I3b2-eMS6-hE43-V7dp-t1Fb-ksNDTI

It looks like that pvcreate fails silently...but I don't see any messages in /var/log/syslog or /var/log/messages. Any clues?

Revision history for this message
justinsb (justin-fathomdb) wrote :

Other than the earlier suggestions (merge in the error checking branch)...

What perplexes me is that it looks like pvcreate is succeeding in both cases, because a PV appears to be created.

If it's not dependent on the order in which you run the commands, then perhaps it's a permissions problem? (I wonder if the sudo is causing trouble). Who are you running this as? Can you try running as root (e.g. sudo bash beforehand).

Also, I noticed you seem to have uid and gid flags... do you by any chance have ~soren/nova/derootification merged in? Are you running off a clean trunk?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I am running off a clean trunk (the latest) and as root. I also tried to merge ~justin-fathomdb/nova/check-subprocess-exit-code and ~justin-fathomdb/nova/check-subprocess-exit-code but I haven't got any better error messages. I still experience the same issue, which is the volume group is not created at initialization, if nova-volume runs as daemon.

if the physical volume/volume group already exist, pvcreate and vgcreate both fail with exit code 5 (I see that if I launch the commands from the shell), this means that also nova-volume incur in this error, however the log does not trace any of that.

Revision history for this message
justinsb (justin-fathomdb) wrote :

Inspired by that exit code 5 comment, would it be correct to rephrase the bug as "if the PV / VG already exists, the storage manager fails to launch"? Or are you deleting the PV/VG in between runs?

The "if it already exists" bug probably exists, even if it isn't your problem here.

What I may do is fix the "if it already exists" bug, and add more logging of the stdout/stderr/exit code in case of problems (which I'll have to look at anyway as part of the bug fix.)

But do please let me know whether you're cleaning up the PV/VG in between (sounds like you probably are...)

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I am deleting PV/VG in between runs.

self.stderr.write() and self.stdout.write() do not write on my console in either mode (daemon, nodaemon) so I replaced them with log traces and I managed to see that when nova-volume runs in daemon mode vgcreate does not get called at all! Is it possible that the process.simple_execute call gets lost somehow in the call chain?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

by the way, with your changes I can see in the log that the pvcreate call fails in the "if it already exists" case, but that (as you pointed out) is not the problem here. It's just this oddity about the daemon mode, which might potentially affect every other service!

Revision history for this message
Vish Ishaya (vishvananda) wrote :

It would be great to figure out why this is happening but ultimately It
might be better if nova volume didn't go around creating volume groups at
all and just checked to make sure the right one exists.

On Aug 19, 2010 9:35 AM, "Armando Migliaccio" <email address hidden>
wrote:
> by the way, with your changes I can see in the log that the pvcreate
> call fails in the "if it already exists" case, but that (as you pointed
> out) is not the problem here. It's just this oddity about the daemon
> mode, which might potentially affect every other service!
>
> --
> vgcreate/lvcreate in volume/service.py fail and go undetected
> https://bugs.launchpad.net/bugs/620027
> You received this bug notification because you are a member of Nova
> Bugs, which is subscribed to OpenStack Compute (nova).
>
> Status in OpenStack Compute (Nova): New
>
> Bug description:
> I noticed that service.py under nova/volume contains this flag:
>
> flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use for
volumes')
>
> my host does not have a /dev/sdb, so vgcreate in _init_volume_group(self)
fails and consequently lvcreate in _create_lv fails too. It seems that no
exeption is reported in the log file. Shouldn't storage_dev be /dev/loop0 or
any other free loop device chosen by losetup passed by $NOVA_VOLUME_ARGS?
>
> Thanks,
> Armando
>
>
>
>

Revision history for this message
Vish Ishaya (vishvananda) wrote :

This may be an unrelated problem, but there is a particularly nasty issue
with LVM interacting with AoE. The various LV commands will hang trying to
stat orphaned aoe devices. And they hang badly in a system call and can't
be killed. The best solution is to add something to LVM config so it
doesn't try to stat the aoe devices. My Filter looks like so in
/etc/lvm/lvm.conf:

    filter = [ "r|/dev/etherd/.*|", "r|/dev/block/.*|", "a/.*/" ]

On Thu, Aug 19, 2010 at 9:29 AM, Armando Migliaccio <
<email address hidden>> wrote:

> by the way, with your changes I can see in the log that the pvcreate
> call fails in the "if it already exists" case, but that (as you pointed
> out) is not the problem here. It's just this oddity about the daemon
> mode, which might potentially affect every other service!
>
> --
> vgcreate/lvcreate in volume/service.py fail and go undetected
> https://bugs.launchpad.net/bugs/620027
> You received this bug notification because you are a member of Nova
> Bugs, which is subscribed to OpenStack Compute (nova).
>
> Status in OpenStack Compute (Nova): New
>
> Bug description:
> I noticed that service.py under nova/volume contains this flag:
>
> flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use for
> volumes')
>
> my host does not have a /dev/sdb, so vgcreate in _init_volume_group(self)
> fails and consequently lvcreate in _create_lv fails too. It seems that no
> exeption is reported in the log file. Shouldn't storage_dev be /dev/loop0 or
> any other free loop device chosen by losetup passed by $NOVA_VOLUME_ARGS?
>
> Thanks,
> Armando
>
>
>
>
>

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I commented out the pvcreate command under _init_volume_group

def _init_volume_group(self):
        if FLAGS.fake_storage:
            return
        #yield process.simple_execute(
        # "sudo pvcreate %s" % (FLAGS.storage_dev))
        yield process.simple_execute(
                "sudo vgcreate %s %s" % (FLAGS.volume_group,
                                         FLAGS.storage_dev))

and let nova-volume create the group and the physical disk in one go. The command's output looks like below:

No physical volume label read from /dev/loop0
  Physical volume "/dev/loop0" successfully created
  Volume group "nova-volumes" successfully created

when I run nova-volume as daemon I finally get the volume group created!! I know it does not sound like a bug fix, but commenting pvcreate out does circumvent the problem.

If vgcreate takes care of the "pvcreation" too, would it make sense to have just one simple_execute call?

Revision history for this message
Jay Pipes (jaypipes) wrote :

Ping on this bug. Where are we with this. Vish, Justin, has anything been fixed in this regard? Is the bug valid? Trying to do a little maintenance on outstanding bugs...thanks.

Changed in nova:
status: New → Incomplete
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

<a bit of housekeeping> After the eventlet merge and the latest developments on the nova branch, I think this bug report no longer applies

Changed in nova:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.