Default Encoding

Bug #9931 reported by Scott James Remnant (Canonical)
16
Affects Status Importance Assigned to Milestone
python-defaults (Ubuntu)
Won't Fix
Low
Unassigned
python3-defaults (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

If we're truly intending to go UTF-8 everywhere, we should change the default
encoding used by Python (set in site.py) from 'ascii' to 'utf-8'.

Tags: upstream
Revision history for this message
Steve Alexander (stevea) wrote :

Changing the default encoding in site.py from the Python default is asking for
trouble. This can break or change the behaviour of programs in subtle ways. I
think Ubuntu developers should make the case to the Python upstream that they
should change the default encoding to UTF-8, but should not do so for Ubuntu
until Python upstream does so.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

It was my understanding that the whole point of having the default encoding
*only* settable inside site.py was to stop programmers relying on any default
encoding for their scripts?

Has this backfired on them and resulted in a total assumption that it's C? I
would be oh-so-surprised :o)

Can we petition them to leave sys.setdefaultencoding() alone for any application
to play with too?

Revision history for this message
Matthias Klose (doko) wrote :

So what you really want is to have a possibility to play with it? I.e. depending
on an environment variable PYTHON_DEFAULT_ENCODING, set it UTF8? I don't like to
change the default for Ubuntu only. OTOH it's a config file to play with ...

Revision history for this message
Jason Toffaletti (jason) wrote :

default encoding is still 'ascii' in dapper's python2.4 package.
is this going to get changed, has anyone talked to upstream?
there is some discussion about this on these blogs:

http://blog.ianbicking.org/illusive-setdefaultencoding.html
http://faassen.n--tree.net/blog/view/weblog/2005/08/02/0

perhaps this should be discussed on ubuntu-devel?

Revision history for this message
Steve Alexander (stevea) wrote :

From reading these weblog articles, it is clear that the only sane way forward
is if Python upstream changes the default encoding.

Interestingly, Frederik Lundh, a core python developer, says that the hook to
set the default encoding should have been removed before a final release.

Revision history for this message
Phil Bull (philbull) wrote :

Have there been any developments with this bug? It's been NeedInfo for quite a while.

Thanks

Revision history for this message
Dennis Kaarsemaker (dennis) wrote :

Yes, python upstream probably won't change default encoding before python 3.0.

Revision history for this message
Dennis Kaarsemaker (dennis) wrote : Patch

Drop postrm, patch postinst to remove existing symlink, install rulesfile in correct place.

Revision history for this message
Dennis Kaarsemaker (dennis) wrote :

meh -EBADBUG for the attachment...

Revision history for this message
Carthik Sharma (carthik) wrote :

At least a few folks seem to agree this is a bug. Changing to confirmed.

Changed in python2.3:
status: Needs Info → Confirmed
Revision history for this message
Steve Alexander (stevea) wrote :

Interesting. I felt it was pretty much agreed that Ubuntu should not change this until upstream Python changes.

To change this in Ubuntu and not for Python environments in general will cause problems, such as Python code written on an Ubuntu system not working correctly when used elsewhere, and thus Ubuntu being an unsuitable platform for testing Python code.

This goes against the goal of making Ubuntu a preferred platform for Python development.

Revision history for this message
Olivier Cortès (olive) wrote :

I understand the last statement, but i'd like an advice :

When i get input in python from my utf-8 terminal, if a global-like setting (like site.py) is telling me that input is utf-8, from what/where source could I know input is coming as utf-8 ?

I think I should inspect the locale, the deduce the charset, and then "something" globally for my app to know that every unicode() call should translate from utf-8. Is that right ?

If tha'ts the case, knowing that all my ubuntu system is utf-8 aware and configured for, i thought it was "easyer" but consistant too, for site.py to detect this and set it globally for python.

my 2 cents : in site.py, i didn't change the default encoding (let it to us-ascii), but I activated the "if 0:" lines, which find the locale and set the defaut encoding from the locale charset. In this case, when the locale is unset or its charset unknow, defaut goes back to us-ascii, and I feel this very consistant.

What's you opinion ? Should I point at elsewhere to find an already discussed topic ?

Revision history for this message
Matthias Klose (doko) wrote :

reassign the report for python; won't be changed for any released python version. it's likely to be changed by upstream in py3k.

Changed in python2.4:
importance: Medium → Low
Revision history for this message
Brent Newland (brent-newland) wrote :

Seems the consensus is that upstream should be the one to determine if it's beneficial to implement this change; mark as invalid?

Revision history for this message
Matthias Klose (doko) wrote :

> Seems the consensus is that upstream should be the one to determine if it's beneficial to implement this change; mark as invalid?

will change when python (>= 3) is the default.

Changed in python-defaults:
assignee: doko → nobody
Revision history for this message
Matthias Klose (doko) wrote :

fixed in python3-defaults, won't fix for python-defaults

Changed in python3-defaults (Ubuntu):
status: New → Fix Released
Changed in python-defaults (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Patches

Remote bug watches

Bug watches keep track of this bug in other bug trackers.