Encoding problems (input and output) with UTF8 encoding/locale

Bug #182260 reported by era
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
typespeed (Debian)
New
Unknown
typespeed (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

Binary package hint: typespeed

When running typespeed in a gnome-terminal window, Finnish words display with a "?" placeholder anywhere where there is supposed to be "ä" or "ö" on this Feisty box. Making matters worse, it is impossible to type an ä or an ö.

Tried both with the current locale (see below) and with "<email address hidden>", which is also installed.

Scandinavian characters otherwise work fine in X; obviously, irrespective of the general locale settings, I have Finnish keyboard selected in the X preferences.

vnix$ locale
LANG=en_AU.UTF-8
LANGUAGE=en_AU:en
LC_CTYPE="en_AU.UTF-8"
LC_NUMERIC="en_AU.UTF-8"
LC_TIME="en_AU.UTF-8"
LC_COLLATE="en_AU.UTF-8"
LC_MONETARY="en_AU.UTF-8"
LC_MESSAGES="en_AU.UTF-8"
LC_PAPER="en_AU.UTF-8"
LC_NAME="en_AU.UTF-8"
LC_ADDRESS="en_AU.UTF-8"
LC_TELEPHONE="en_AU.UTF-8"
LC_MEASUREMENT="en_AU.UTF-8"
LC_IDENTIFICATION="en_AU.UTF-8"
LC_ALL=

(I think the AU comes from bug #40107 which was present when this system was originally installed.)

Revision history for this message
era (era) wrote :

Sorry, I meant "<email address hidden>" with a dash before the 8 ...

The attached screenshot contains what is apparently supposed to be the Finnish word "järjestelmä". This type of mojibake is typical for applications which are hardcoded to use Latin-1.

Revision history for this message
Daniel T Chen (crimsun) wrote :

Is this symptom still reproducible in 8.10?

Changed in typespeed:
status: New → Incomplete
Revision history for this message
era (era) wrote :

Not this particular symptom, because it seems that all the ISO-8859-1 words in the dictionary are simply skipped.

I played five rounds and did not get a single word with ä or ö in it. Statistically speaking, a bit over 10% of the words in the file words.fin contain a character outside the range a-z (that's 59/486 to be precise, excluding the first line of the file which is just a label) so after seeing a few hundred words, I started to recognize many of them as repeats, but none that had the umlaut vowels (in Finnish, there are not technically umlauts, but you know which I mean).

This is a freshly installed Intrepid amd64 installation, with LANG=en_US.UTF-8 but Finnish keyboard preferences. I also tried LC_ALL=C and LC_ALL=fi_FI.UTF-8 (after installing the Finnish locale via System > Administration > Language Support) to no avail. It won't let me use fi_FI.ISO-8859-1 or fi.ISO-8859-1 or fi_FI.ISO-8859-15 although they're in /usr/share/i18n/SUPPORTED but this is probably a locale problem (see below; notice the error messages from locale).

Also I can't seem to strace the program.

bash$ grep ^fi /usr/share/i18n/SUPPORTED
fi_FI.UTF-8 UTF-8
fi_FI ISO-8859-1
fi_FI@euro ISO-8859-15
fil_PH UTF-8

bash$ LC_ALL=fi_FI.UTF-8 locale
LANG=en_US.UTF-8
LC_CTYPE="fi_FI.UTF-8"
LC_NUMERIC="fi_FI.UTF-8"
LC_TIME="fi_FI.UTF-8"
LC_COLLATE="fi_FI.UTF-8"
LC_MONETARY="fi_FI.UTF-8"
LC_MESSAGES="fi_FI.UTF-8"
LC_PAPER="fi_FI.UTF-8"
LC_NAME="fi_FI.UTF-8"
LC_ADDRESS="fi_FI.UTF-8"
LC_TELEPHONE="fi_FI.UTF-8"
LC_MEASUREMENT="fi_FI.UTF-8"
LC_IDENTIFICATION="fi_FI.UTF-8"
LC_ALL=fi_FI.UTF-8

vnix$ LC_ALL=fi_FI.ISO-8859-1 locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LC_CTYPE="fi_FI.ISO-8859-1"
LC_NUMERIC="fi_FI.ISO-8859-1"
LC_TIME="fi_FI.ISO-8859-1"
LC_COLLATE="fi_FI.ISO-8859-1"
LC_MONETARY="fi_FI.ISO-8859-1"
LC_MESSAGES="fi_FI.ISO-8859-1"
LC_PAPER="fi_FI.ISO-8859-1"
LC_NAME="fi_FI.ISO-8859-1"
LC_ADDRESS="fi_FI.ISO-8859-1"
LC_TELEPHONE="fi_FI.ISO-8859-1"
LC_MEASUREMENT="fi_FI.ISO-8859-1"
LC_IDENTIFICATION="fi_FI.ISO-8859-1"
LC_ALL=fi_FI.ISO-8859-1

vnix$ <email address hidden> locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
<email address hidden>"
<email address hidden>"
<email address hidden>"
<email address hidden>"
<email address hidden>"
<email address hidden>"
<email address hidden>"
<email address hidden>"
<email address hidden>"
<email address hidden>"
<email address hidden>"
<email address hidden>"
<email address hidden>

vnix$ <email address hidden> typespeed
typespeed: main: setlocale

Changed in typespeed:
status: Incomplete → New
Revision history for this message
era (era) wrote :

Oh and also for the record, typing those keys produces weird mojibake in the input field as before (looks like some escape sequence, like [?? where the ? are the Unicode "unknown character" glyph, a question mark on top of a solid ball, like in the screenshot I posted earlier).

Revision history for this message
Daniel Hahler (blueyed) wrote :

I confirm this, using de_DE.UTF-8.

Typing "ü" adds "MM-<" to the input box, "ß" becomes "M~" etc.

Also strings in the menu are displayed wrong, if they contain an umlaut:
 4. PunktestM-CM-$nde anzeigen

"msgunfmt /usr/share/locale/de_DE/LC_MESSAGES/typespeed.mo" appears to display the strings properly as utf8.

Maybe typespeed sets up internal conversation in a wrong way?

I can make it fail on startup using the following - while the specified locale setting is invalid / not installed though:
$ LANG=de_DE.latin1 typespeed
typespeed: main: setlocale

Changed in typespeed (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
summary: - Finnish keyboard doesn't work under X
+ Encoding problems (input and output) with at least de_DE and sv_FI
+ locales
Daniel Hahler (blueyed)
summary: - Encoding problems (input and output) with at least de_DE and sv_FI
- locales
+ Encoding problems (input and output) with UTF8 encoding/locale
Changed in typespeed (Debian):
status: Unknown → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.