copy_from test case logic is invalid

Bug #955527 reported by Jay Pipes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Fix Released
High
Eoghan Glynn

Bug Description

In the TestCopyToFile._do_test_copy_from() method, there is some faulty logic it seems:

        # DELETE original image
        path = "http://%s:%d/v1/images/%s" % ("0.0.0.0", self.api_port,
                                              original_image_id)
        http = httplib2.Http()
        response, content = http.request(path, 'DELETE')
        self.assertEqual(response.status, 200)

        # GET image again to make sure the existence of the original
        # image in from_store is not depended on
        path = "http://%s:%d/v1/images/%s" % ("0.0.0.0", self.api_port,
                                              copy_image_id)
        http = httplib2.Http()
        response, content = http.request(path, 'GET')
        self.assertEqual(response.status, 200)

The problem is that the deletion path for the original image is not correct. It needs to be a DELETE call against the Swift or S3 remote backend image, not the local Glance server.

This causes the following failure:

FAIL: Ensure we can copy from an external image in S3.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/gate-glance-python26/glance/tests/utils.py", line 166, in _runner
    func(*args, **kw)
  File "/home/jenkins/workspace/gate-glance-python26/glance/tests/utils.py", line 184, in wrapped
    func(*a, **kwargs)
  File "/home/jenkins/workspace/gate-glance-python26/glance/tests/functional/test_swift.py", line 542, in test_copy_from_s3
    self._do_test_copy_from('s3', get_s3_uri)
  File "/home/jenkins/workspace/gate-glance-python26/glance/tests/functional/test_swift.py", line 511, in _do_test_copy_from
    self.assertEqual(response.status, 200)
AssertionError: 404 != 200
>> raise self.failureException, \
          (None or '%r != %r' % (404, 200))

Eoghan Glynn (eglynn)
Changed in glance:
assignee: nobody → Eoghan Glynn (eglynn)
Revision history for this message
Eoghan Glynn (eglynn) wrote :

The point of that delete in the test is to ensure that the copied image (in swift) is independent of the original image (in s3).

Now I think it should be OK for the deletion of the *original* image to be mediated by the glance API layer, to avoid the test being directly boto-aware.

But the fact that the subsequent GET of the copied image ID fails with a 404 indicates either:

 (a) the copy isn't really independent of the existence of the original,

or:

 (b) swift is intermittently broken.

Evidence pointing to option (b) includes the fact that the mysterious 404 returned by swift afflicts other tests intermittently, e.g TestS3.test_remote_image in https://jenkins.openstack.org/job/gate-glance-python26/85/console

I'll continue to dig tomorrow, but I suspect an intermittent swift issue may be the nub of the problem.

Eoghan Glynn (eglynn)
Changed in glance:
status: Confirmed → In Progress
Revision history for this message
Eoghan Glynn (eglynn) wrote :

Ignore the comment above about TestS3.test_remote_image(), as that test doesn't hit swift.

Revision history for this message
Eoghan Glynn (eglynn) wrote :

Here's an example of a similar failure where the destination store is file-based as opposed to swift (again S3 is the external source that's copied from).

======================================================================
FAIL: Ensure we can copy from an external image in S3.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/gate-glance-python26/glance/tests/utils.py", line 166, in _runner
    func(*args, **kw)
  File "/home/jenkins/workspace/gate-glance-python26/glance/tests/utils.py", line 184, in wrapped
    func(*a, **kwargs)
  File "/home/jenkins/workspace/gate-glance-python26/glance/tests/functional/test_copy_to_file.py", line 174, in test_copy_from_s3
    self._do_test_copy_from('s3', get_s3_uri)
  File "/home/jenkins/workspace/gate-glance-python26/glance/tests/functional/test_copy_to_file.py", line 142, in _do_test_copy_from
    self.assertEqual(response.status, 200)
AssertionError: 404 != 200
>> raise self.failureException, \
          (None or '%r != %r' % (404, 200))

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/5396

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance (master)

Reviewed: https://review.openstack.org/5396
Committed: http://github.com/openstack/glance/commit/6619298fc17b0cf8b9c0bd0be53162139c802eee
Submitter: Jenkins
Branch: master

commit 6619298fc17b0cf8b9c0bd0be53162139c802eee
Author: Eoghan Glynn <email address hidden>
Date: Thu Mar 15 13:44:56 2012 +0000

    Ensure copy and original image IDs differ.

    Ensure the copy_from tests have unmarshalled a distinct ID for the
    copied image, in order to eliminate the possibility that the subsequent
    deletion of the original image impacts on the existence of the copy.

    Eliminates one bizarre and unlikely explanation for the test failures
    reported in bug 955527.

    Change-Id: If8d0720f6d08cdc57a3e01431bd3460d95a868a9

Changed in glance:
status: In Progress → Fix Committed
Revision history for this message
Kevin L. Mitchell (klmitch) wrote :

The "fix" that was merged was just the addition of more tests to eliminate one possibility for the source of the bug, so I've set this back to "in progress".

Changed in glance:
status: Fix Committed → In Progress
Revision history for this message
Brian Waldon (bcwaldon) wrote :

Kevin: can you elaborate with what you expect to be done to properly fix this bug?

Revision history for this message
Eoghan Glynn (eglynn) wrote :

Brian: the problem is that we haven't identified the root issue as yet, though I think we can safely say the problem is not that the original image is being deleted via the glance API instead of directly via the S3 API (as speculated in the Bug Description above).

I'll continue with root cause analysis to see if I can diagnose it further. However, this is proving slow as the issue doesn't appear reproducible in my development evironment (only occurs intermittently on the Jenkins slaves).

Revision history for this message
Eoghan Glynn (eglynn) wrote :

Here's an interesting variant on the problem, that arose after many manually re-triggered Jenkins builds.

In this case the failure mode is different, in that the *first* GET on the copied image fails with 404, whereas previously the pattern was that the *second* GET on the copied would 404 after the original image had been DELETE'd.

======================================================================
FAIL: Ensure we can copy from an external image in S3.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/gate-glance-python26/glance/tests/utils.py", line 184, in wrapped
    func(*a, **kwargs)
  File "/home/jenkins/workspace/gate-glance-python26/glance/tests/functional/test_s3.py", line 264, in test_copy_from_s3
    self._do_test_copy_from('s3', get_s3_uri)
  File "/home/jenkins/workspace/gate-glance-python26/glance/tests/functional/test_s3.py", line 219, in _do_test_copy_from
    self.assertEqual(response.status, 200)
AssertionError: 404 != 200
>> raise self.failureException, \
          (None or '%r != %r' % (404, 200))

Revision history for this message
Eoghan Glynn (eglynn) wrote :

OK, so we can eliminate an unlikely ID collision as the root cause, as the latest GET copy/DELETE original/GET copy failure in the Jenkins build occurred after the new assertion to ensure the IDs are distinct.

Revision history for this message
Eoghan Glynn (eglynn) wrote :

Proposed a patch to capture the entire glance service logs for failing tests:

  https://review.openstack.org/5466

which should help us get to the bottom of this the S3-related failures.

Thierry Carrez (ttx)
tags: added: essex-rc-potential
Changed in glance:
milestone: essex-rc1 → none
Revision history for this message
Eoghan Glynn (eglynn) wrote :

A bit of detour into a theory that the issue was caused by eventual consistency only being supported by S3 in the US standard region, whereas read-after-write consistency is guaranteed in other regions such as eu-west-1.

This theory was disproved however by re-configuring the glance functional tests to use a bucket in eu-west-1, with the failure continueing to be observed.

Revision history for this message
Eoghan Glynn (eglynn) wrote :

Latest thought on this is a simple race between the S3 testcase body and S3 test teardown running near-conncurent in the Jenkins py26 and py27 builds.

A work-around using unique per-test bucket names is proposed to address the race condition:

  https://review.openstack.org/#change,5590

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/5590
Committed: http://github.com/openstack/glance/commit/56c7ba73deac0c2fa9ee1887e953428c40782114
Submitter: Jenkins
Branch: master

commit 56c7ba73deac0c2fa9ee1887e953428c40782114
Author: Eoghan Glynn <email address hidden>
Date: Tue Mar 20 22:07:02 2012 +0000

    Use unique per-test S3 bucket name.

    Addresses bug 955527

    Avoids a race between the S3 testcase body and the bucket clean-out
    in the S3 test teardown, running near concurrently in the Jenkins
    python26 and python27 builds.

    Change-Id: I10f013b02bb74db2cbfc692ed372121d21bca3fb

Changed in glance:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (milestone-proposed)

Fix proposed to branch: milestone-proposed
Review: https://review.openstack.org/5901

Brian Waldon (bcwaldon)
Changed in glance:
milestone: none → essex-rc2
tags: removed: essex-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance (milestone-proposed)

Reviewed: https://review.openstack.org/5901
Committed: http://github.com/openstack/glance/commit/86dbfe18e6fcd2e70e8736bbf4f3e10eda9b05e2
Submitter: Jenkins
Branch: milestone-proposed

commit 86dbfe18e6fcd2e70e8736bbf4f3e10eda9b05e2
Author: Eoghan Glynn <email address hidden>
Date: Tue Mar 20 22:07:02 2012 +0000

    Use unique per-test S3 bucket name.

    Addresses bug 955527

    Avoids a race between the S3 testcase body and the bucket clean-out
    in the S3 test teardown, running near concurrently in the Jenkins
    python26 and python27 builds.

    Change-Id: I10f013b02bb74db2cbfc692ed372121d21bca3fb

Changed in glance:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in glance:
milestone: essex-rc2 → 2012.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.