Need full unicode support for Swift

Bug #1008940 reported by Edward
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Dashboard (Horizon)
Fix Released
Medium
Tihomir Trifonov
python-swiftclient
Fix Released
Undecided
Tihomir Trifonov

Bug Description

This error can be reproduced in two cases:
1. copy an object to/from the container with Unicode character in its name, e.g. Japanese or Chinese character.
2. copy an object with Unicode character in its name

Revision history for this message
Edward (zhang-hare) wrote :
Revision history for this message
Vincent Hou (houshengbo) wrote :
  • 1 Edit (187.6 KiB, image/png)

I need to add one more scenario. When we upload a local file with non ascii coded name, this error is also raised.

UnicodeEncodeError at /nova/containers/first/upload

'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Request Method: POST
Request URL: http://9.119.148.161/nova/containers/first/upload
Django Version: 1.4
Exception Type: UnicodeEncodeError
Exception Value:

'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Exception Location: /usr/lib/python2.7/httplib.py in putheader, line 938
Python Executable: /usr/bin/python
Python Version: 2.7.3
Python Path:

['/opt/stack/horizon/openstack_dashboard/wsgi/../..',
 '/opt/stack/python-keystoneclient',
 '/opt/stack/python-novaclient',
 '/opt/stack/python-openstackclient',
 '/usr/local/lib/python2.7/dist-packages/cliff-0.7-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/tablib-0.9.11-py2.7.egg',
 '/opt/stack/keystone',
 '/usr/local/lib/python2.7/dist-packages/WebOb-1.0.8-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/pam-0.1.4-py2.7.egg',
 '/opt/stack/glance',
 '/usr/local/lib/python2.7/dist-packages/jsonschema-0.2-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/pysendfile-2.0.0-py2.7-linux-x86_64.egg',
 '/usr/local/lib/python2.7/dist-packages/boto-2.1.1-py2.7.egg',
 '/opt/stack/nova',
 '/opt/stack/horizon',
 '/opt/stack/python-quantumclient',
 '/opt/stack/quantum',
 '/usr/local/lib/python2.7/dist-packages/python_gflags-1.3-py2.7.egg',
 '/opt/stack/swift',
 '/opt/stack/swift3',
 '/usr/local/lib/python2.7/dist-packages',
 '/opt/stack/python-glanceclient',
 '/usr/lib/python2.7',
 '/usr/lib/python2.7/plat-linux2',
 '/usr/lib/python2.7/lib-tk',
 '/usr/lib/python2.7/lib-old',
 '/usr/lib/python2.7/lib-dynload',
 '/usr/local/lib/python2.7/dist-packages',
 '/usr/lib/python2.7/dist-packages',
 '/usr/lib/python2.7/dist-packages/PIL',
 '/usr/lib/python2.7/dist-packages/gst-0.10',
 '/usr/lib/python2.7/dist-packages/gtk-2.0',
 '/usr/lib/pymodules/python2.7',
 '/usr/lib/python2.7/dist-packages/ubuntu-sso-client',
 '/usr/lib/python2.7/dist-packages/ubuntuone-client',
 '/usr/lib/python2.7/dist-packages/ubuntuone-control-panel',
 '/usr/lib/python2.7/dist-packages/ubuntuone-couch',
 '/usr/lib/python2.7/dist-packages/ubuntuone-installer',
 '/usr/lib/python2.7/dist-packages/ubuntuone-storage-protocol',
 '/opt/stack/horizon/openstack_dashboard']

Vincent Hou (houshengbo)
affects: swift → nova
Vincent Hou (houshengbo)
affects: nova → horizon
Revision history for this message
Vincent Hou (houshengbo) wrote :

In script horizon/horizon/api/swift.py, there is a method called swift_copy_object.
An exception will be caught if any of the containers' names or the objects' names is not str. This may be an issue for horizon.

Edward (zhang-hare)
Changed in horizon:
assignee: nobody → Edward (zhang-hare)
Revision history for this message
Devin Carlen (devcamcar) wrote :

The error message comes from Horizon, but the underlying issue is with Swift. We are displaying the message from Horizon because it was better than the evil message that was coming from Swift, but all we could do was work around it in that way.

I'll add affects -> swift and find out what the story there is.

Changed in horizon:
status: New → Confirmed
importance: Undecided → Medium
milestone: none → folsom-3
Revision history for this message
Tihomir Trifonov (ttrifonov) wrote :

The problem is described here:

https://bugs.launchpad.net/horizon/+bug/1009314

The reason that Unicode fails is that filenames are being send as HTTP Headers, and cloudfiles doesn't perform quoting in ISO-8859-1 for Unicode values. So if clouudfiles gets finally fixed - both bugs will be fixed.

Revision history for this message
Gabriel Hurley (gabriel-hurley) wrote :

I just tried this with python-swiftclient and found even worse results than with cloudfiles. I'm bumping this out of Horizon's milestones until such time as swift wants to address their unicode support in their clients/API.

Changed in horizon:
milestone: folsom-3 → none
Revision history for this message
gholt (gholt) wrote :

If you could provide a curl command that reproduces the problem, it would really help us out. If not, the python-swiftclient command would help as well.

Mike Barton (redbo)
affects: swift → python-swiftclient
Revision history for this message
Tihomir Trifonov (ttrifonov) wrote :

As I see from source, python-swiftclient also uses HTTPConnection to connect, and the http headers are not encoded in ISO-8559-1 as required. Thus any non-ASCII --meta passed will fail, as the meta values are passes as HTTP headers.

Here is an example:

    swift post CONTAINER OBJECT -m X-Object-Meta-Orig-Filename:幸福

Revision history for this message
gholt (gholt) wrote :

What versions of swift and python-swiftclient are you using? I just tried with the master of both and the command worked:

Stared with fresh, empty, account:

$ echo test > OBJECT
$ swift upload CONTAINER OBJECT
OBJECT
$ swift post CONTAINER OBJECT -m X-Object-Meta-Orig-Filename:幸福
$ swift stat CONTAINER OBJECT
       Account: AUTH_test
     Container: CONTAINER
        Object: OBJECT
  Content Type: application/octet-stream
Content Length: 5
 Last Modified: Tue, 14 Aug 2012 22:37:44 GMT
          ETag: d8e8fca2dc0f896fd7cb4cb0031ba249
Meta X-Object-Meta-Orig-Filename: 幸福
 Accept-Ranges: bytes
   X-Timestamp: 1344983864.32925

Revision history for this message
Edward (zhang-hare) wrote :

I found the following code which can tell the truth.

/opt/stack/horizon/horizon/api/swift.py:

def swift_copy_object(request, orig_container_name, orig_object_name,
                      new_container_name, new_object_name):
    try:
        # FIXME(gabriel): Cloudfiles currently fails at unicode in the
        # copy_to method, so to provide a better experience we check for
        # unicode here and pre-empt with an error message rather than
        # letting the call fail.
        str(orig_container_name)
        str(orig_object_name)
        str(new_container_name)
        str(new_object_name)
    except UnicodeEncodeError:
        raise exceptions.HorizonException(_("Unicode is not currently "
                                            "supported for object copy."))
    container = swift_api(request).get_container(orig_container_name)

    if swift_object_exists(request, new_container_name, new_object_name):
        raise exceptions.AlreadyExists(new_object_name, 'object')

    orig_obj = container.get_object(orig_object_name)
    return orig_obj.copy_to(new_container_name, new_object_name)

Revision history for this message
gholt (gholt) wrote :
Download full text (3.2 KiB)

Yeah, I'm pretty sure there's just a misunderstanding somewhere along the line.

According to the code I see in Swift, it has supported Unicode names with COPY/Destination (and PUT/X-Copy-From) for quite some time (since the beginning of OpenStack?) but all strings sent to Swift need to be UTF-8 and URL encoded.

There was a problem with this with the large object support (X-Object-Manifest) but that was fixed about a month and half ago -- not exactly related to this particular bug.

The key here is: Path strings and header values sent to Swift need to be UTF-8 and URL encoded -- sometimes you can get away with not doing it, but it's not a supported thing when you do.

For instance, the previous example should really be using %E5%B9%B8%E7%A6%8F for the Unicode value. The following quick snippet should show this (but I guess it depends on your Terminal; mine UTF-8 encodes strings from the command line):

python -c 'import sys, urllib; print urllib.quote(sys.argv[-1])' 幸福
%E5%B9%B8%E7%A6%8F

With that as the UTF-8/URL-Encoded string, the example should've been:

$ echo test > OBJECT
$ swift upload CONTAINER OBJECT
OBJECT
$ swift post CONTAINER OBJECT -m X-Object-Meta-Orig-Filename:%E5%B9%B8%E7%A6%8F
$ swift stat CONTAINER OBJECT
       Account: AUTH_test
     Container: CONTAINER
        Object: OBJECT
  Content Type: application/octet-stream
Content Length: 5
 Last Modified: Wed, 15 Aug 2012 01:17:08 GMT
          ETag: d8e8fca2dc0f896fd7cb4cb0031ba249
Meta X-Object-Meta-Orig-Filename: %E5%B9%B8%E7%A6%8F
 Accept-Ranges: bytes
   X-Timestamp: 1344993428.16163

Let's do another example with the value as the object name, including a copy/destination and a put/x-copy-from.

$ curl -XPUT http://127.0.0.1:8080/v1/AUTH_test/CONTAINER/%E5%B9%B8%E7%A6%8F -Hx-auth-token:AUTH_tkc10eb54529004c0d91e5ca3310ee7a09 --data-binary 'test'
201 Created
$ curl -XCOPY http://127.0.0.1:8080/v1/AUTH_test/CONTAINER/%E5%B9%B8%E7%A6%8F -Hx-auth-token:AUTH_tkc10eb54529004c0d91e5ca3310ee7a09 -Hdestination:/CONTAINER/copied
201 Created
$ curl -XPUT http://127.0.0.1:8080/v1/AUTH_test/CONTAINER/x-copy-fromed -Hx-auth-token:AUTH_tkc10eb54529004c0d91e5ca3310ee7a09 -Hx-copy-from:/CONTAINER/%E5%B9%B8%E7%A6%8F -Hcontent-length:0
201 Created
$ curl http://127.0.0.1:8080/v1/AUTH_test/CONTAINER?format=json -Hx-auth-token:AUTH_tkc10eb54529004c0d91e5ca3310ee7a09 -s | python -mjson.tool
[
    {
        "bytes": 5,
        "content_type": "application/octet-stream",
        "hash": "d8e8fca2dc0f896fd7cb4cb0031ba249",
        "last_modified": "2012-08-15T01:17:08.161630",
        "name": "OBJECT"
    },
    {
        "bytes": 4,
        "content_type": "application/x-www-form-urlencoded",
        "hash": "098f6bcd4621d373cade4e832627b4f6",
        "last_modified": "2012-08-15T01:30:58.755590",
        "name": "copied"
    },
    {
        "bytes": 4,
        "content_type": "application/x-www-form-urlencoded",
        "hash": "098f6bcd4621d373cade4e832627b4f6",
        "last_modified": "2012-08-15T01:32:51.507340",
        "name": "x-copy-fromed"
    },
    {
        "bytes": 4,
        "content_type": "application/x-www-form-urlencoded",
        "hash": "098f6bcd4621d373...

Read more...

Revision history for this message
Tihomir Trifonov (ttrifonov) wrote :

Okay, there are 2 kind of problems here.
First one is with the current python-swiftclient version. I've made a simple test - installed the version from PyPI (1.1.1), and tried the following:

    swift post CONTAINER OBJECT -m X-Object-Meta-Orig-Filename:幸福

Everything works fine. This code uses the installed codebase from PyPI(/usr/local/lib/python2.7/dist-packages/swiftclient for me).

Now trying to force the usage of latest master codebase in /opt/stack/swift and I get an error:

  File "/usr/lib/python2.7/httplib.py", line 955, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 807, in _send_output
    msg = "\r\n".join(self._buffer)

So comparing the self._buffer in v1.1.1 and v1.6, here it is:

v1.6(master branch):

    [u'POST /v1/AUTH_29a1e3e9cbee4671a4d29cadb4b1a018/1231/%D0%BE%D0%B1%D0%B5%D0%BA%D1%82 HTTP/1.1', 'Host: 192.168.10.103:8080', 'Accept-Encoding: identity', 'X-Object-Meta-X-Object-Meta-Orig-Filename: \xe5\xb9\xb8\xe7\xa6\x8f', 'X-Auth-Token: 40813f9f7c9d4ee0968be3ba39b2058c', '', '']

v1.1.1(PyPI):

    ['POST /v1/AUTH_29a1e3e9cbee4671a4d29cadb4b1a018/1231/%D0%BE%D0%B1%D0%B5%D0%BA%D1%82 HTTP/1.1', 'Host: 192.168.10.103:8080', 'Accept-Encoding: identity', 'X-Object-Meta-X-Object-Meta-Orig-Filename: \xe5\xb9\xb8\xe7\xa6\x8f', 'X-Auth-Token: 9c56611636c24723b5cf2cf5106f2f0a', '', '']

So the only difference here is that we have both unicode string: u'POST /v1/AUTH...' and UTF-encoded string (X-Object... header), and then "\r\n".join(self._buffer) tries to perform a unicode decoding and fails with the error:

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 43: ordinal not in range(128)

So this problem would probably be addressed in python-swiftclient.

Now heading to Horizon:

If the headers are only encoded:

    value.encode('utf-8')

but not quoted(), then we face the problem above with passing u'POST /v1/AUTH_xxxx...'.
If the value is quoted:

    quote(v.encode('utf-8'))

Then everything works fine.
I'd suggest that both fixes are done in python-swiftclient. I have one simple reason for that - in case someday the HTTPConnection implementation in swiftclient is changed(now it uses swift.common.bufferedhttp.BufferedHTTPConnection with possible fallback to eventlet.green.httplib.HTTPConnection and finally httplib.HTTPConnection), it will receive url_quoted headers, while it might work with all kind of headers(to perform internal validation). Then we'll have to remove the quoting and encoding from Horizon ...

The code that is pasted above for 'swift_copy_object' is now obsolete and needs to be removed. It actually doesn't break anything itself, just notifies that there is something broken.

Revision history for this message
Tihomir Trifonov (ttrifonov) wrote :

Ah, one more thing - currently horizon has this in python-swiftclient:

python-swiftclient>1.1,<1.2

Probably obsolete, and needs to be removed.. I'm not sure what's the purpose of it..

Revision history for this message
Gabriel Hurley (gabriel-hurley) wrote :

Steps to close:

  1. Verify that the command works with the currently released python-swiftclient.
  2. Make sure the data being passed to the client is correctly UTF-8 encoded and URL encoded.
  3. Verify that all swift commands work as such in Horizon.

Changed in horizon:
milestone: none → grizzly-1
assignee: Edward (zhang-hare) → Nebula (nebula)
summary: - Unicode is not currently supported for object copy.
+ Need full unicode support for Swift
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-swiftclient (master)

Fix proposed to branch: master
Review: https://review.openstack.org/14330

Changed in python-swiftclient:
assignee: nobody → Tihomir Trifonov (ttrifonov)
status: New → In Progress
Changed in horizon:
assignee: Nebula (nebula) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-swiftclient (master)

Reviewed: https://review.openstack.org/14330
Committed: http://github.com/openstack/python-swiftclient/commit/8b42f8a40c9a48f85b8d4d859afb4e28a510c036
Submitter: Jenkins
Branch: master

commit 8b42f8a40c9a48f85b8d4d859afb4e28a510c036
Author: Tihomir Trifonov <email address hidden>
Date: Thu Oct 11 15:04:00 2012 +0300

    Force utf-8 encode of HTTPConnection params

    This patch forces swiftclient to encode to utf-8
    all url and headers arguments, to avoid the
    UnicodeDecodeError which is raised by '\r\n'.join([])
    invoked in htplib.py.

    Currently the affected projects are Horizon(upload file
    with unicode name) and swiftclient CLI('swift post' with
    unicode filename as header)

    This is also a follow-up of this review:
        https://review.openstack.org/#/c/14216/

    I'd still want to hear what the Swift core devs
    think of it. Is it better to create a new
    AutoEncodingHTTPConnection? Or to handle the connection
    creation and make sure there are no unicode and utf-8
    string at the same time. If these unicode checks have to
    be added in the calling code(Dashboard, CLI), there are
    so many places to be added, and also in all new commands
    that might be exposed from the API.

    Fixes bug 1008940

    Change-Id: Ice2aa29024429d3e6f569a88d5cf8b4202537827

Changed in python-swiftclient:
status: In Progress → Fix Committed
Changed in horizon:
milestone: grizzly-1 → grizzly-2
Changed in horizon:
assignee: nobody → Tihomir Trifonov (ttrifonov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to horizon (master)

Fix proposed to branch: master
Review: https://review.openstack.org/17845

Changed in horizon:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to horizon (master)

Reviewed: https://review.openstack.org/17845
Committed: http://github.com/openstack/horizon/commit/4b1fc167893ee0ab4ae5d57ad2ad051c2088dc02
Submitter: Jenkins
Branch: master

commit 4b1fc167893ee0ab4ae5d57ad2ad051c2088dc02
Author: Tihomir Trifonov <email address hidden>
Date: Tue Dec 11 12:02:06 2012 +0200

    Fixed unicode for object copy

    Removed deprecated check for unicode symbols
    in object names, fixed the reverse() for success_url,
    which expects "/" to be included in path arguments.

    Fixes bug 1008940

    Change-Id: I1122437c40f8e31b64a82b39cd326141842ca519

Changed in horizon:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in horizon:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in horizon:
milestone: grizzly-2 → 2013.1
Changed in python-swiftclient:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.