buffer overrun in repr() for unicode strings

Bug #56633 reported by Benjamin C. Wiley Sittler
272
Affects Status Importance Assigned to Milestone
Python
Unknown
Unknown
python2.4 (Ubuntu)
Fix Released
Medium
Matthias Klose
Dapper
Fix Released
High
Martin Pitt

Bug Description

hi,

i discovered a bug yesterday in repr() for unicode strings. this
causes an unpatched non-debug wide (UTF-32/UCS-4) build of python to
abort:

python2.4 -c 'assert(repr(u"\U00010000" * 39 + u"\uffff" * 4096)) ==
(repr(u"\U00010000" * 39 + u"\uffff" * 4096))'

the problem is fixed by a change to unicodeobject.c. in the process of
fixing it i also found and fixed another bug in repr() on UCS-4 python
builds -- previously paired unicode surrogates were being repr()'ed as a
single "character" even though they are not treated as such by a UCS-4
python build -- i.e. eval(repr(u'\ud800\udc00')) != u'\ud800\udc00' in
an unpatched UCS-4 build.

Package: python2.4
Version: 2.4.3-7ubuntu2
Severity: important

when i run this command:

python -c "repr(u'\u24ea\u059c\u200a\U0001d77e\uff07\u202f\u0747\u202f
\U0001d56b\U0001d5b9\U0001d4e9\u20052\u14bf\U0001d7f8\u200a\U0001d795
\U0001d6e7Z\u2006\u2002\U0001d50a\uff27\u13c0\u2000\uff16\u0411\uff16
\U0001d7e7\uff4c\u2006\u2001\ufe39\u2008\u0313]\u2008\u3014\u3015')"

python aborts with the following backtrace and memory dump:

*** glibc detected *** python: realloc(): invalid next size: 0x081521e8
***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xb7e8acd4]
/lib/tls/i686/cmov/libc.so.6(__libc_realloc+0xff)[0xb7e8cc5f]
python(_PyString_Resize+0x80)[0x8082b4b]
python[0x80991f7]
python(PyObject_Repr+0x58)[0x807d1fd]
python(PyEval_EvalFrame+0x4b37)[0x80b5270]
python(PyEval_EvalCodeEx+0x836)[0x80b65d6]
python(PyEval_EvalCode+0x57)[0x80b6640]
python(PyRun_SimpleStringFlags+0xa8)[0x80d8b7c]
python(Py_Main+0x685)[0x8055862]
python(main+0x22)[0x80550e2]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xd8)[0xb7e378b8]
python[0x8055041]
======= Memory map: ========
08048000-0811a000 r-xp 00000000 08:03 622736 /usr/bin/python2.4
0811a000-0813b000 rw-p 000d1000 08:03 622736 /usr/bin/python2.4
0813b000-081b5000 rw-p 0813b000 00:00 0 [heap]
b7c00000-b7c21000 rw-p b7c00000 00:00 0
b7c21000-b7d00000 ---p b7c21000 00:00 0
b7d40000-b7d4a000 r-xp 00000000 08:03 376899 /lib/libgcc_s.so.1
b7d4a000-b7d4b000 rw-p 00009000 08:03 376899 /lib/libgcc_s.so.1
b7d68000-b7d9b000 r--p 00000000 08:03
82634 /usr/lib/locale/en_US.utf8/LC_CTYPE
b7d9b000-b7d9e000 r-xp 00000000 08:03
625529 /usr/lib/python2.4/lib-dynload/_locale.so
b7d9e000-b7d9f000 rw-p 00003000 08:03
625529 /usr/lib/python2.4/lib-dynload/_locale.so
b7d9f000-b7e22000 rw-p b7d9f000 00:00 0
b7e22000-b7f51000 r-xp 00000000 08:03
66543 /lib/tls/i686/cmov/libc-2.4.so
b7f51000-b7f53000 r--p 0012e000 08:03
66543 /lib/tls/i686/cmov/libc-2.4.so
b7f53000-b7f55000 rw-p 00130000 08:03
66543 /lib/tls/i686/cmov/libc-2.4.so
b7f55000-b7f58000 rw-p b7f55000 00:00 0
b7f58000-b7f7c000 r-xp 00000000 08:03
66547 /lib/tls/i686/cmov/libm-2.4.so
b7f7c000-b7f7e000 rw-p 00023000 08:03
66547 /lib/tls/i686/cmov/libm-2.4.so
b7f7e000-b7f80000 r-xp 00000000 08:03
68161 /lib/tls/i686/cmov/libutil-2.4.so
b7f80000-b7f82000 rw-p 00001000 08:03
68161 /lib/tls/i686/cmov/libutil-2.4.so
b7f82000-b7f83000 rw-p b7f82000 00:00 0
b7f83000-b7f85000 r-xp 00000000 08:03
66546 /lib/tls/i686/cmov/libdl-2.4.so
b7f85000-b7f87000 rw-p 00001000 08:03
66546 /lib/tls/i686/cmov/libdl-2.4.so
b7f87000-b7f96000 r-xp 00000000 08:03
68156 /lib/tls/i686/cmov/libpthread-2.4.so
b7f96000-b7f98000 rw-p 0000f000 08:03
68156 /lib/tls/i686/cmov/libpthread-2.4.so
b7f98000-b7f9a000 rw-p b7f98000 00:00 0
b7fb0000-b7fb7000 r--s 00000000 08:03
2130015 /usr/lib/gconv/gconv-modules.cache
b7fb7000-b7fb9000 rw-p b7fb7000 00:00 0
b7fb9000-b7fd2000 r-xp 00000000 08:03 2737127 /lib/ld-2.4.so
b7fd2000-b7fd4000 rw-p 00018000 08:03 2737127 /lib/ld-2.4.so
bf99b000-bf9b3000 rw-p bf99b000 00:00 0 [stack]
ffffe000-fffff000 ---p 00000000 00:00 0 [vdso]
Aborted

-- System Information:
Debian Release: testing/unstable
  APT prefers edgy-updates
  APT policy: (500, 'edgy-updates'), (500, 'edgy-security'), (500,
'edgy-backports'), (500, 'edgy')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/dash
Kernel: Linux 2.6.17-5-386
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages python2.4 depends on:
ii libbz2-1.0 1.0.3-3 high-quality block-sorting
file co
ii libc6 2.4-1ubuntu8 GNU C Library: Shared
libraries
ii libdb4.4 4.4.20-6 Berkeley v4.4 Database
Libraries [
ii libncurses5 5.5-2ubuntu1 Shared libraries for
terminal hand
ii libncursesw5 5.5-2ubuntu1 Shared libraries for
terminal hand
ii libreadline5 5.1-7build1 GNU readline and history
libraries
ii libssl0.9.8 0.9.8b-2build1 SSL shared libraries
ii mime-support 3.36-1 MIME files 'mime.types' &
'mailcap
ii python2.4-minimal 2.4.3-7ubuntu2 A minimal subset of the
Python lan

python2.4 recommends no packages.

-- no debconf information

the patch is online here:

http://zoehep.xent.com/~bsittler/python2.4-2.4.3_unicodeobject.c.diff

and also inlined here and attached to this message:

--- Objects/unicodeobject.c 2006-03-27 23:32:36.000000000 -0800
+++ /home/bsittler/pkgs/python2.4-2.4.3_unicodeobject.c.patched
2006-08-16 12:37:19.000000000 -0700
@@ -1968,7 +1968,29 @@

     static const char *hexdigit = "0123456789abcdef";

- repr = PyString_FromStringAndSize(NULL, 2 + 6*size + 1);
+ /* Initial allocation is based on the longest-possible unichr
+ escape.
+
+ In wide (UTF-32) builds '\U00xxxxxx' is 10 chars per source
+ unichr, so in this case it's the longest unichr escape. In
+ narrow (UTF-16) builds this is five chars per source unichr
+ since there are two unichrs in the surrogate pair, so in narrow
+ (UTF-16) builds it's not the longest unichr escape.
+
+ In wide or narrow builds '\uxxxx' is 6 chars per source unichr,
+ so in the narrow (UTF-16) build case it's the longest unichr
+ escape.
+
+ */
+
+ repr = PyString_FromStringAndSize(NULL,
+ 2
+#ifdef Py_UNICODE_WIDE
+ + 10*size
+#else
+ + 6*size
+#endif
+ + 1);
     if (repr == NULL)
         return NULL;

@@ -1993,15 +2015,6 @@
 #ifdef Py_UNICODE_WIDE
         /* Map 21-bit characters to '\U00xxxxxx' */
         else if (ch >= 0x10000) {
- int offset = p - PyString_AS_STRING(repr);
-
- /* Resize the string if necessary */
- if (offset + 12 > PyString_GET_SIZE(repr)) {
- if (_PyString_Resize(&repr, PyString_GET_SIZE(repr) + 100))
- return NULL;
- p = PyString_AS_STRING(repr) + offset;
- }
-
             *p++ = '\\';
             *p++ = 'U';
             *p++ = hexdigit[(ch >> 28) & 0x0000000F];
@@ -2014,8 +2027,8 @@
             *p++ = hexdigit[ch & 0x0000000F];
            continue;
         }
-#endif
- /* Map UTF-16 surrogate pairs to Unicode \UXXXXXXXX escapes */
+#else
+ /* Map UTF-16 surrogate pairs to '\U00xxxxxx' */
        else if (ch >= 0xD800 && ch < 0xDC00) {
            Py_UNICODE ch2;
            Py_UCS4 ucs;
@@ -2040,6 +2053,7 @@
            s--;
            size++;
        }
+#endif

         /* Map 16-bit characters to '\uxxxx' */
         if (ch >= 256) {

CVE References

Revision history for this message
Simon Law (sfllaw) wrote :

Hi Benjamin,
Have you sent this upstream to the Python bug tracker on SourceForge?
If not, I'd suggest doing this so that they can merge it in. If you'd like
I can also send this up as well, but it seems like you'd like to be the one
to get credit for your patch.

Thanks.

Changed in python2.4:
status: Unconfirmed → Needs Info
Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote :

i don't care who gets credit and i'm really busy. can you send it upstream?

btw, i've attached the patch.

Revision history for this message
Simon Law (sfllaw) wrote :
Changed in python2.4:
status: Needs Info → Confirmed
importance: Untriaged → Medium
Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote :

thanks! let me know if they need any more info from the systems i tested this on.

Revision history for this message
Simon Law (sfllaw) wrote :

It appears that Georg Brandl has applied this patch.

It should show up in the next release of Python 2.4.

Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote : Re: [Bug 56633] Re: buffer overrun in repr() for unicode strings

Thanks! Perhaps this will inspire me to write a patch for the buggy
UTF-7 codec...

On 8/22/06, Simon Law <email address hidden> wrote:
> It appears that Georg Brandl has applied this patch.
>
> It should show up in the next release of Python 2.4.
>
> ** Bug watch added: Python at Sourceforge #1541585
> http://sourceforge.net/tracker/index.php?aid=1541585&group_id=5470&atid=305470&func=detail
>
> ** Also affects: python (upstream) via
> http://sourceforge.net/tracker/index.php?aid=1541585&group_id=5470&atid=305470&func=detail
> Importance: Unknown
> Status: Unknown
>
> --
> buffer overrun in repr() for unicode strings
> https://launchpad.net/bugs/56633
>

Revision history for this message
Matthias Klose (doko) wrote :

fixed in 2.4.3-8ubuntu1

Changed in python2.4:
assignee: nobody → doko
status: Confirmed → Fix Released
Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote :
Download full text (3.4 KiB)

this bug does not appear to be actually fixed in version
2.4.3-8ubuntu1 (i.e. the patch has not been applied):

$ apt-cache show python2.4 | fgrep Version
Version: 2.4.3-8ubuntu1
 Version 2.4 of the high-level, interactive object oriented language,
Python-Version: 2.4
$ python2.4 -c 'assert(repr(u"\U00010000" * 39 +u"\uffff" * 4096))
==(repr(u"\U00010000" * 39 + u"\uffff" * 4096))'
*** glibc detected *** python2.4: realloc(): invalid next size: 0x081a2628 ***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xb7e0d38a]
/lib/tls/i686/cmov/libc.so.6(__libc_realloc+0xff)[0xb7e0dcbf]
python2.4(_PyString_Resize+0x91)[0x8084bb1]
python2.4[0x809c0c8]
python2.4(PyObject_Repr+0x65)[0x807edc5]
python2.4(PyEval_EvalFrame+0x4801)[0x80b8941]
python2.4(PyEval_EvalCodeEx+0x839)[0x80b9fc9]
python2.4(PyEval_EvalCode+0x57)[0x80ba037]
python2.4(PyRun_SimpleStringFlags+0xa8)[0x80dd3d8]
python2.4(Py_Main+0x684)[0x8055884]
python2.4(main+0x22)[0x80550f2]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xdc)[0xb7dba8cc]
python2.4[0x8055041]
======= Memory map: ========
08048000-08120000 r-xp 00000000 08:03 623527 /usr/bin/python2.4
08120000-08141000 rw-p 000d8000 08:03 623527 /usr/bin/python2.4
08141000-081b8000 rw-p 08141000 00:00 0 [heap]
b7b00000-b7b21000 rw-p b7b00000 00:00 0
b7b21000-b7c00000 ---p b7b21000 00:00 0
b7cc3000-b7ccd000 r-xp 00000000 08:03 2285687 /lib/libgcc_s.so.1
b7ccd000-b7cce000 rw-p 00009000 08:03 2285687 /lib/libgcc_s.so.1
b7ceb000-b7d1e000 r--p 00000000 08:03 82634
/usr/lib/locale/en_US.utf8/LC_CTYPE
b7d1e000-b7d21000 r-xp 00000000 08:03 635852
/usr/lib/python2.4/lib-dynload/_locale.so
b7d21000-b7d22000 rw-p 00003000 08:03 635852
/usr/lib/python2.4/lib-dynload/_locale.so
b7d22000-b7da5000 rw-p b7d22000 00:00 0
b7da5000-b7ed2000 r-xp 00000000 08:03 2362694 /lib/tls/i686/cmov/libc-2.4.so
b7ed2000-b7ed4000 r--p 0012c000 08:03 2362694 /lib/tls/i686/cmov/libc-2.4.so
b7ed4000-b7ed6000 rw-p 0012e000 08:03 2362694 /lib/tls/i686/cmov/libc-2.4.so
b7ed6000-b7ed9000 rw-p b7ed6000 00:00 0
b7ed9000-b7efd000 r-xp 00000000 08:03 2363095 /lib/tls/i686/cmov/libm-2.4.so
b7efd000-b7eff000 rw-p 00023000 08:03 2363095 /lib/tls/i686/cmov/libm-2.4.so
b7eff000-b7f01000 r-xp 00000000 08:03 2363110
/lib/tls/i686/cmov/libutil-2.4.so
b7f01000-b7f03000 rw-p 00001000 08:03 2363110
/lib/tls/i686/cmov/libutil-2.4.so
b7f03000-b7f04000 rw-p b7f03000 00:00 0
b7f04000-b7f06000 r-xp 00000000 08:03 2363094 /lib/tls/i686/cmov/libdl-2.4.so
b7f06000-b7f08000 rw-p 00001000 08:03 2363094 /lib/tls/i686/cmov/libdl-2.4.so
b7f08000-b7f17000 r-xp 00000000 08:03 2363105
/lib/tls/i686/cmov/libpthread-2.4.so
b7f17000-b7f19000 rw-p 0000f000 08:03 2363105
/lib/tls/i686/cmov/libpthread-2.4.so
b7f19000-b7f1b000 rw-p b7f19000 00:00 0
b7f31000-b7f38000 r--s 00000000 08:03 2130376
/usr/lib/gconv/gconv-modules.cache
b7f38000-b7f3a000 rw-p b7f38000 00:00 0
b7f3a000-b7f53000 r-xp 00000000 08:03 376899 /lib/ld-2.4.so
b7f53000-b7f55000 rw-p 00018000 08:03 376899 /lib/ld-2.4.so
bff0d000-bff23000 rw-p bff0d000 00:00 0 [stack]
ffffe000-fffff000 ---p 00000000 00:00 0 [vdso]
Aborted
$

On 8/25/06, Matthias Klose <email address hidden> w...

Read more...

Revision history for this message
Simon Law (sfllaw) wrote :

Re-opening the bug.

Changed in python2.4:
status: Fix Released → Confirmed
Revision history for this message
Martin Pitt (pitti) wrote :

I will backport the fix to stables once edgy is confirmed to be fixed. Matthias, what's the status on this? Thank you

Changed in python2.4:
assignee: nobody → pitti
importance: Untriaged → High
status: Unconfirmed → In Progress
Revision history for this message
Matthias Klose (doko) wrote :

fixed in 2.4.3-8ubuntu2 (edgy)

Changed in python2.4:
status: Confirmed → Fix Released
Revision history for this message
Kees Cook (kees) wrote :
Revision history for this message
Kees Cook (kees) wrote :

I've got debdiffs built, and will be sending to pitti shortly.

Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote :

btw, i can confirm that this fix has been applied in edgy on the
following python interpreters:

python2.4 2.4.3-8ubuntu2
python2.5 2.5-0ubuntu1

however it's still broken in the following edgy python:

python2.3 2.3.5-15ubuntu1

and in dapper it's broken too, at least in the version i have installed:

python2.4 2.4.2-0ubuntu3

On Tue, 2006-10-03 at 01:54 +0000, Kees Cook wrote:
> I've got debdiffs built, and will be sending to pitti shortly.
>

Revision history for this message
Martin Pitt (pitti) wrote :

http://www.ubuntu.com/usn/usn-359-1

python2.3/edgy has recently been fixed as well. Thank you!

Changed in python2.4:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.