data unavaibility for non-ascii metadata, after python3 upgrade

Bug #2012531 reported by Edouard Dausque
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
High
Tim Burke

Bug Description

Hello,

Context/explanation:
Swift (2.29) platform is running with python2.

A user has uploaded an object with non-ascii metadata:

curl -X PUT -H "x-object-meta-mtime: 1677510051.018921" -H "X-Auth-Token: $token" -H "X-Object-Meta-mymetadata: testñÞþ压制版.mp4" https://example.org/v1/AUTH_test/test/dummy.txt

On object-server side, python2, there is no issue:

curl -g -I -XHEAD "http://128.66.0.1:6001/disk-02-000/259455/AUTH_test/test/dummy.txt" --verbose
* About to connect() to 128.66.0.1 port 6001 (#0)
* Trying 128.66.0.1...
* connected
* Connected to 128.66.0.1 (128.66.0.1) port 6001 (#0)
> HEAD /disk-02-000/259455/AUTH_test/test/dummy.txt HTTP/1.1
> User-Agent: curl/7.26.0
> Host: 128.66.0.1:6001
> Accept: */*
>
* additional stuff not fine transfer.c:1042: 0 0
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Content-Length: 0
Content-Length: 0
< X-Backend-Timestamp: 1678714902.43428
X-Backend-Timestamp: 1678714902.43428
< Content-Type: text/plain
Content-Type: text/plain
< X-Object-Meta-Mymetadata: testñÞþ压制版.mp4
X-Object-Meta-Mymetadata: testñÞþ压制版.mp4

After switching to python3, the object-server does not answer anymore, for this specific object:

curl -g -I -XHEAD "http://128.66.0.1:6001/disk-02-000/259455/AUTH_test/test/dummy.txt" --verbose
* About to connect() to 128.66.0.1 port 6001 (#0)
* Trying 128.66.0.1...
* connected
* Connected to 128.66.0.1 (128.66.0.1) port 6001 (#0)
> HEAD /disk-02-000/259455/AUTH_test/test/dummy.txt HTTP/1.1
> User-Agent: curl/7.26.0
> Host: 128.66.0.1:6001
> Accept: */*
>
* additional stuff not fine transfer.c:1042: 0 0
* Empty reply from server
* Connection #0 to host 128.66.0.1 left intact
curl: (52) Empty reply from server
* Closing connection #0

No stack-trace and 200 OK in logs.

Tags: python3
Revision history for this message
Edouard Dausque (edausq) wrote :
Revision history for this message
Tim Burke (1-tim-z) wrote :

Can repro, though I *do* get a traceback in logs:

Mar 22 18:27:41 saio object-6040: 127.0.0.1 - - [22/Mar/2023:18:27:41 +0000] "HEAD /sdb4/6/AUTH_test/c/tox.ini" 200 2935 "HEAD https://saio/v1/AUTH_test/c/tox.ini" "tx444f20aa644e43f48d5bb-00641b489d" "proxy-server 28880" 0.0010 "-" 28867 0
Mar 22 18:27:41 saio object-6040: STDERR: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/eventlet/wsgi.py", line 607, in handle_one_response
    write(b'')
  File "/usr/local/lib/python3.10/dist-packages/eventlet/wsgi.py", line 493, in write
    towrite.append(six.b('%s: %s\r\n' % header))
  File "/usr/local/lib/python3.10/dist-packages/six.py", line 644, in b
    return s.encode("latin-1")
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 33-35: ordinal not in range(256) (txn: tx444f20aa644e43f48d5bb-00641b489d)

Changed in swift:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Tim Burke (1-tim-z)
Revision history for this message
clayg (clay-gerrard) wrote :

maybe we could start writing down explicitly what version of python was running when the metadata was written down, and then try to guess when decodeing if the key isn't there (configurable timestamp operator can use to indicate py2=>py3 transistion, with -1 default for "always py3")

Changed in swift:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.opendev.org/c/openstack/swift/+/878558
Committed: https://opendev.org/openstack/swift/commit/780754096267851545f2aa97afb250064a2292e3
Submitter: "Zuul (22348)"
Branch: master

commit 780754096267851545f2aa97afb250064a2292e3
Author: Tim Burke <email address hidden>
Date: Fri Mar 24 13:27:24 2023 -0700

    Properly read py2 object metadata on py3

    Replicated, unencrypted metadata is written down differently on py2
    vs py3, and has been since we started supporting py3. Fortunately,
    we can inspect the raw xattr bytes to determine whether the pickle
    was written using py2 or py3, so we can properly read legacy py2 meta
    under py3 rather than hitting a unicode error.

    Closes-Bug: #2012531
    Change-Id: I5876e3b88f0bb1224299b57541788f590f64ddd4

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.32.0

This issue was fixed in the openstack/swift 2.32.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.