ceph: test_volume_boot_pattern fails with "ImageNotFound: error protecting snapshot"

Bug #1627220 reported by Matt Riedemann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Confirmed
Undecided
Unassigned
devstack-plugin-ceph
In Progress
Undecided
Unassigned

Bug Description

Seen here:

http://logs.openstack.org/10/306010/6/gate/gate-tempest-dsvm-full-devstack-plugin-ceph-ubuntu-xenial/4f8f1f9/logs/screen-c-vol.txt.gz?level=TRACE#_2016-09-23_20_25_43_699

2016-09-23 20:25:43.699 ERROR oslo_messaging.rpc.server [req-5d2e8b92-26f9-43c9-9cd0-2f47ee45952a tempest-TestVolumeBootPatternV2-150096293] Exception during message handling
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/manager.py", line 4377, in create_snapshot
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server snapshot)
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/manager.py", line 851, in create_snapshot
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server snapshot.save()
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server self.force_reraise()
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/manager.py", line 843, in create_snapshot
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server model_update = self.driver.create_snapshot(snapshot)
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/drivers/rbd.py", line 762, in create_snapshot
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server volume.protect_snap(snap)
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server File "rbd.pyx", line 1403, in rbd.Image.protect_snap (/build/ceph-XmVvyr/ceph-10.2.2/src/build/rbd.c:14223)
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server ImageNotFound: error protecting snapshot volume-42036fae-421c-4f7d-8e98-c16f13319140@snapshot-b7a09aa6-a8a1-495d-bfd5-1f48efa92619
2016-09-23 20:25:43.699 14681 ERROR oslo_messaging.rpc.server

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ImageNotFound%3A%20error%20protecting%20snapshot%5C%22%20AND%20message%3A%5C%22rbd.Image.protect_snap%5C%22%20AND%20message%3A%5C%22create_snapshot%5C%22%20AND%20tags%3A%5C%22screen-c-vol.txt%5C%22&from=7d

Tags: ceph rbd snapshot
Matt Riedemann (mriedem)
Changed in cinder:
status: New → Confirmed
tags: added: snapshot
Revision history for this message
Mohammed Naser (mnaser) wrote :

We're hitting this in our internal CI against our public cloud (running Tempest smoke tests). I'm unsure why but just wanted to report that we're seeing it here.

We run Tempest master as well.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like this started spiking around 9/24:

https://goo.gl/FRSBHK

Revision history for this message
Jason Dillaman (jdillaman) wrote :

The logs from the linked test run show it's a Xenial host using Ceph 10.2.2. There was a bug fix [1] that could result in heavily overloaded systems dropping update notifications which is fixed in v10.2.3. I'll submit a workaround patch for devstack-plugin-ceph that should avoid that issue regardless of the version used.

[1] http://tracker.ceph.com/issues/16404

Revision history for this message
Jason Dillaman (jdillaman) wrote :

Patch to devstack-plugin-ceph submitted [1].

[1] https://review.openstack.org/377118

Matt Riedemann (mriedem)
Changed in devstack-plugin-ceph:
status: New → In Progress
Revision history for this message
Mohammed Naser (mnaser) wrote :

We are running 10.2.2 as well which confirms this issue. 10.2.3 was released a week ago so perhaps instead of the workaround, you can look into running CI over it?

We're still waiting for Storage SIG to release the 10.2.3 packages. But, for Xenial, you might be able to get the debs directly here:

https://download.ceph.com/debian-jewel/pool/main/c/ceph/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.