Menus for choosing language should have one option per available translation

Bug #693337 reported by Gunnar Hjalmarsson
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Translations
Fix Released
Low
Unassigned
gdm (Ubuntu)
Fix Released
Undecided
Gunnar Hjalmarsson
language-selector (Ubuntu)
Fix Released
Undecided
Gunnar Hjalmarsson

Bug Description

Binary package hint: language-selector

There are two GUI menus in Ubuntu for selecting language: One on the GDM login screen, and one in language-selector's "Language" tab. Both show basically the locales that are currently available on the system. However, the number of available translations are often just a fraction of the number of locales. Examples: There are six German locales, but just one German translation; there are two Swedish locales, but just one Swedish translation, etc.

OTOH, as regards the language menu on GDM's login screen, there may be locale variants among the options, such as ca_ES.utf8@valencia, which do represent separate translations, but are either not shown at all, or whose labels don't distinguish them from the locales of which they are variants (bug 408474 and bug 685619).

Obviously this is confusing to the users, who can be assumed to expect one option per translation. Which locales, variables etc. that are used behind the scenes is not relevant to the average user.

It should be noted that the "Language" menu in the Ubuntu GDM package has only recently (as part of the solution to bug 553162) been converted from a general locale picker to a pure language picker. The reasoning in this bug report may therefore not be applicable to GDM for other distributions.

Changed in language-selector (Ubuntu):
assignee: nobody → Gunnar Hjalmarsson (gunnarhj)
Revision history for this message
David Planella (dpm) wrote :

Hi Gunnar,

I understand the rationale, but I'm not sure I understand your proposal for a fix. Could you please elaborate?

 - If this is about showing all locales including variants in gdm, it should be a duplicate of bug 408474
 - If this is about showing them in language-selector, IIRC, it already does.

Thanks!

Changed in ubuntu-translations:
status: New → Incomplete
Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

The lang-list.pl script illustrates the approach I'm suggesting in this bug report. Please feel free to download and run it on your computer:

perl lang-list.pl

Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

Hi David,

The idea is about _refraining_ from showing all the locales, both in GDM and in language-selector's "Language" tab. (The tab in language-selector for other locale settings is the place for showing all the locales.) At the same time, the variants should be added to the GDM options as per bug 408474.

Right now I have four installed languages: English, Swedish, German and Catalan. While 'locale -a | grep \.utf8' lists 29 locales, running the lang-list.pl script gives me:

ca Catalan
ca_ES@valencia Catalan (Spain - Valencia)
en English
en_AU English (Australia)
en_CA English (Canada)
en_NZ English (New Zealand)
en_GB English (United Kingdom)
en_US English (United States)
de German
sv Swedish

These are some of my questions for the time being:

* Would this be a suitable approach for determining available languages
  for message translation?

* Is listing the /usr/share/locale-langpack directory a safe way to
  find available translations, or is there more into it?

* As regards languages with more than one translation: When the country
  is not specified, does it matter which of the locales that is
  assigned to the LC_MESSAGES environment variable? If it does, how do
  we determine the main dialect of respective language?

/ Gunnar

Changed in ubuntu-translations:
status: Incomplete → New
Changed in language-selector (Ubuntu):
status: New → In Progress
Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

Packages with a suggested solution to the language-selector side of this bug are available for testing at https://launchpad.net/~gunnarhj/+archive/language-menus

Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

The language-selector branch with changes that address this bug (https://code.launchpad.net/~gunnarhj/language-selector/language-menu) also includes a few UI changes. The attachment language-selector-ui.png shows what it looks like.

Revision history for this message
Martin Pitt (pitti) wrote :

> * Would this be a suitable approach for determining available languages for message translation?

My first reaction to that was "but what if I actually want the Swiss German variant?". While it's unusual, we can in theory have e. g. Swiss or Austrian specific German translations. But on second thought, this solves it:

> * Is listing the /usr/share/locale-langpack directory a safe way to find available translations, or is there more into it?

This is actually a very nice idea. As long as we don't actually have actual country specific variants of a language (like de), it wouldn't show them at all, but it would retain the country specific variants that we really need (such as en_GB or pt_BR).

The only problem here is that /usr/share/locale-langpack/ is an Ubuntuism and not compatible with any other distribution or upstream, but that boat already left a while ago anyway :-) So this isn't a blocker.

> * As regards languages with more than one translation: When the country is not specified, does it matter which of the locales that is assigned to the LC_MESSAGES environment variable?

No, it doesn't for messages. It is relevant for $LANG and other LC_* categories, of course.

I guess for the actual implementation in l-s and gdm we don't need to parse /usr/share/xml/iso-codes/, as these already have the translated names of the locales?

Thanks for working on this!

Revision history for this message
Arne Goetje (arnegoetje) wrote : Re: [Bug 693337] Re: Menus for choosing language should have one option per available translation

On 01/06/2011 07:55 PM, Martin Pitt wrote:
>> * Would this be a suitable approach for determining available
> languages for message translation?
>
> My first reaction to that was "but what if I actually want the Swiss
> German variant?". While it's unusual, we can in theory have e. g.
> Swiss or Austrian specific German translations. But on second
> thought, this solves it:

It wouldn't be unusual if the German translation team didn't decide to
unify those variants into Standard German. Swiss German has a different
grammar usage than in Germany, at least.
(Gnucash for example has de_CH translations, but that's in universe.)

>> * Is listing the /usr/share/locale-langpack directory a safe way
>> to
> find available translations, or is there more into it?
>
> This is actually a very nice idea. As long as we don't actually have
> actual country specific variants of a language (like de), it
> wouldn't show them at all, but it would retain the country specific
> variants that we really need (such as en_GB or pt_BR).
>
> The only problem here is that /usr/share/locale-langpack/ is an
> Ubuntuism and not compatible with any other distribution or
> upstream, but that boat already left a while ago anyway :-) So this
> isn't a blocker.

What about /usr/share/locale/ ? That's at least where universe software
puts its translations. And there you may even find de_DE and en_US
folders... Therefor always put ll_CC codes first in the LANGUAGE
variable string, followed by the country-less fallback, i.e. de_DE:de !
There is code for this in l-s.

>> * As regards languages with more than one translation: When the
>> country is not specified, does it matter which of the locales that
>> is assigned to the LC_MESSAGES environment variable?
> No, it doesn't
> for messages. It is relevant for $LANG and other LC_* categories, of
> course.

Why not? Falling back to en_GB for "English", while the rest of the LC_*
and LANG is en_US, should surely be avoided.

> I guess for the actual implementation in l-s and gdm we don't need
> to parse /usr/share/xml/iso-codes/, as these already have the
> translated names of the locales?

hmm? l-s parses /usr/share/xml/iso-codes/ for exactly that reason.

> Thanks for working on this!

+1

Cheers
Arne
--
Arne Götje (高盛華) <email address hidden>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.

Revision history for this message
Martin Pitt (pitti) wrote :

Arne Goetje [2011-01-06 16:32 -0000]:
> What about /usr/share/locale/ ?

Good point, thanks, I missed that. I think we should offer

 - all languages/locales from /usr/share/langpack-locale/

plus

 - the intersection of /usr/share/locale/ and `locale -a`

Gunnar, how does that sound to you?

> >> * As regards languages with more than one translation: When the
> >> country is not specified, does it matter which of the locales that
> >> is assigned to the LC_MESSAGES environment variable?
> > No, it doesn't
> > for messages. It is relevant for $LANG and other LC_* categories, of
> > course.
>
> Why not? Falling back to en_GB for "English", while the rest of the LC_*
> and LANG is en_US, should surely be avoided.

en_US does specify a country, though. As we are always going to have
en_GB as an explicit variant, this case doesn't apply to English or
Portugese. I thought this was for the case if we only have "de"
translations, then the country in LC_MESSAGES doesn't matter.

> > I guess for the actual implementation in l-s and gdm we don't need
> > to parse /usr/share/xml/iso-codes/, as these already have the
> > translated names of the locales?
>
> hmm? l-s parses /usr/share/xml/iso-codes/ for exactly that reason.

Right, but it does that already, and gdm doesn't (it'd take too long
during boot). I was referring to Gunnar's perl script which currently
parses those.

Thanks,

Martin
--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Revision history for this message
Arne Goetje (arnegoetje) wrote :

On 01/07/2011 12:51 AM, Martin Pitt wrote:
> Arne Goetje [2011-01-06 16:32 -0000]:
>> What about /usr/share/locale/ ?
>
> Good point, thanks, I missed that. I think we should offer
>
> - all languages/locales from /usr/share/langpack-locale/
>
> plus
>
> - the intersection of /usr/share/locale/ and `locale -a`
>
> Gunnar, how does that sound to you?

Just keep in mind that each time a universe package gets installed or
removed, the list of available languages may change then.

>>>> * As regards languages with more than one translation: When the
>>>> country is not specified, does it matter which of the locales that
>>>> is assigned to the LC_MESSAGES environment variable?
>>> No, it doesn't
>>> for messages. It is relevant for $LANG and other LC_* categories, of
>>> course.
>>
>> Why not? Falling back to en_GB for "English", while the rest of the LC_*
>> and LANG is en_US, should surely be avoided.
>
> en_US does specify a country, though. As we are always going to have
> en_GB as an explicit variant, this case doesn't apply to English or
> Portugese. I thought this was for the case if we only have "de"
> translations, then the country in LC_MESSAGES doesn't matter.

right, missed that. I'll shut up now.

>>> I guess for the actual implementation in l-s and gdm we don't need
>>> to parse /usr/share/xml/iso-codes/, as these already have the
>>> translated names of the locales?
>>
>> hmm? l-s parses /usr/share/xml/iso-codes/ for exactly that reason.
>
> Right, but it does that already, and gdm doesn't (it'd take too long
> during boot). I was referring to Gunnar's perl script which currently
> parses those.

ah, OK. :)
Regarding gdm: why would a simple gettext call to iso-codes, like it is
done in l-s, for the list of available languages/countries be too slow
at boot time? Is there a notable time difference between calling the
translation for a string form iso-codes and calling for a translation
from its own translations?

Cheers
Arne
--
Arne Götje (高盛華) <email address hidden>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.

Revision history for this message
Martin Pitt (pitti) wrote :

Arne Goetje [2011-01-07 2:03 -0000]:
> Just keep in mind that each time a universe package gets installed or
> removed, the list of available languages may change then.

Right, that's the downside with this approach. It should only affect
some corner cases, as it'd be very unusual to not have any country
specific translations in the entirety of main, but one in a universe
package?

> Regarding gdm: why would a simple gettext call to iso-codes, like it is
> done in l-s, for the list of available languages/countries be too slow
> at boot time?

gettext is fine, of course. I meant the parsing of the large XML files
that is done in the demo perl script. I assume this is only for the
testing script, as it's quite easy to do.

Thanks,

Martin
--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Revision history for this message
Arne Goetje (arnegoetje) wrote :

On 01/07/2011 07:03 PM, Martin Pitt wrote:
> Arne Goetje [2011-01-07 2:03 -0000]:
>> Just keep in mind that each time a universe package gets installed or
>> removed, the list of available languages may change then.
>
> Right, that's the downside with this approach. It should only affect
> some corner cases, as it'd be very unusual to not have any country
> specific translations in the entirety of main, but one in a universe
> package?

Probably. You can take a look at the langpack tarball we get from
Launchpad for a complete list of available langcodes in main.

>> Regarding gdm: why would a simple gettext call to iso-codes, like it is
>> done in l-s, for the list of available languages/countries be too slow
>> at boot time?
>
> gettext is fine, of course. I meant the parsing of the large XML files
> that is done in the demo perl script. I assume this is only for the
> testing script, as it's quite easy to do.

OK.

Cheers
Arne
--
Arne Götje (高盛華) <email address hidden>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.

Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :
Download full text (4.9 KiB)

Thanks Martin and Arne for your valuable feedback! I agree that also
/usr/share/locale should be taken into consideration, so I modified the
l-s branch accordingly and wrote a new Perl script (lang-list2.pl).

There is also GDM, and since I don't know C, I'm not able to make the
equivalent changes to the C code - somebody else needs to do that,
unless it can't wait till release 13.10 or something...

To me it looks like it's gui/simple-greeter/gdm-languages.c that needs
to be patched. One possibility, that would minimize the necessary
changes to existing GDM files, is that I write a script, that takes a
list of available locales as arguments and outputs the desired list of
language options. Martin, I guess it's up to you. Even if Perl isn't an
option, e.g. due to efficiency concerns, maybe a shell script?

A few comments on your comments:

On 2011-01-06 17:32, Arne Goetje wrote:
> On 01/06/2011 07:55 PM, Martin Pitt wrote:
>>
>
> What about /usr/share/locale/ ? That's at least where universe
> software puts its translations. And there you may even find de_DE and
> en_US folders... Therefor always put ll_CC codes first in the
> LANGUAGE variable string, followed by the country-less fallback, i.e.
> de_DE:de !

Is the country-less fallback in LANGUAGE really necessary? My tests
indicate that gettext automatically goes to 'll' if it doesn't find a
translation in 'll_CC'. Do you know of a reproducable use case that
shows something else? (It should be noted that it doesn't work the
other way around; see bug 700213.)

This is what happens in the code I'm proposing:
* If there are only translations under 'll' for a language, the list of
  language options only includes 'll'. Personally I find it neater to
  not unnecessarily include country specific items.
* If there are translations under 'll_CC' for a language, the list of
  language options does not include 'll' (with the exception of 'en').
  This is because of the gettext bug I mentioned.

> There is code for this in l-s.

I have seen makeEnvString() in LocaleInfo.py, which is used to generate
a LANGUAGE list in the special case when l-s doesn't find a stored
LANGUAGE value. For the above reason I do at least not find it motivated
to extend the use of that function. Please feel free to prove me wrong.
;-)

>>> * As regards languages with more than one translation: When the
>>> country is not specified, does it matter which of the locales
>>> that is assigned to the LC_MESSAGES environment variable?
>>
>> No, it doesn't for messages. It is relevant for $LANG and other
>> LC_* categories, of course.

Actually, in the l-s branch I still propose that we introduce the
possibility to pick the main or origin country; at first hand because
it feels better and may prevent user bewilderment, but also because it
would have prevented this special case surprise:
https://lists.ubuntu.com/archives/ubuntu-desktop/2010-December/002722.html

>> I guess for the actual implementation in l-s and gdm we don't need
>> to parse /usr/share/xml/iso-codes/, as these already have the
>> translated names of the locales?

Right, I was just playing around with the first test script; had no
intension to propose that we upload...

Read more...

Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

I propose a couple of preparatory changes in the linked GDM branch.

Changed in gdm (Ubuntu):
assignee: nobody → Gunnar Hjalmarsson (gunnarhj)
status: New → In Progress
Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

Now there is a complete proposal for a fix of this bug. The most convenient way to check it out is to install the packages at https://launchpad.net/~gunnarhj/+archive/language-menus

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package language-selector - 0.12

---------------
language-selector (0.12) natty; urgency=low

  [ Gunnar Hjalmarsson ]
  * LanguageSelector/gtk/GtkLanguageSelector.py:
    - Show only options corresponding to available translations in the
      combo box on language-selector's "Language" tab (LP: #693337).
  * LanguageSelector/LanguageSelector.py:
    - Skip the encoding part in the dmrc "Language" value. It's not
      a locale name, so let's not give the impression it is.
  * data/LanguageSelector.ui:
    - Clearer labels to describe the second ("Text") tab.
    - Icon added to taskbar. Thanks to Pavol Klačanský (LP: #648109).
    - Texts that inform the user about the need to restart for changes
      to system settings to take effect (LP: #127356, #612991).
    - Ellipses removed from the labels on the "Apply System-Wide"
      buttons (LP: #531799).
    - Layout tweaking of the "Format" (previously "Text") tab
      (LP: #697606).
  * data/main-countries:
    - Provide main or origin country for languages with multiple country
      codes present among the languages' available locales.
  * LanguageSelector/utils.py:
    - Take main country code into account when language2locale()
      generates a locale name for LC_MESSAGES.
    - language2locale() rewritten to make use of other language-selector
      functions.

  [ Martin Pitt ]
  * LanguageSelector/gtk/GtkLanguageSelector.py: Update ListStore construction
    to also work with the next pygobject release.
 -- Gunnar Hjalmarsson <email address hidden> Fri, 28 Jan 2011 15:50:50 +0100

Changed in language-selector (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
David Planella (dpm) wrote :

Good work with this.

I've noticed a problem which I've reported as bug 710148

Changed in ubuntu-translations:
status: New → Fix Released
importance: Undecided → Low
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gdm - 2.32.0-0ubuntu7

---------------
gdm (2.32.0-0ubuntu7) natty; urgency=low

  [ Gunnar Hjalmarsson ]
  * debian/patches/40_one_lang_option_per_translation.patch:
    - The option list in the language chooser changed so the items
      represent available translations instead of locales
      (LP: #693337).
    - setlocale() validation removed (not applicable).
    - Show locale variants in the list of language options
      (LP: #408474).
  * debian/patches/36_language_environment_settings.patch:
    - Skip the encoding part in the dmrc "Language" value. It's not
      a locale name, so let's not give the impression it is.
    - Take main country code into account when generating
      a locale name for LC_MESSAGES.

  [ Kees Cook ]
  * Restore 24_respect_system_minuid.patch: upstream does not handle
    reading login.defs yet (LP: #708911).
 -- Gunnar Hjalmarsson <email address hidden> Thu, 10 Feb 2011 10:07:31 +0100

Changed in gdm (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

Fixes of this bug for Lucid and Maverick are now available in official backports packages. To make Synaptic check for backports updates you can do:

o System -> Administration -> Update Manager -> Settings...

o Select the "Updates" tab and check the "Unsupported updates" option.

More about Ubuntu backports:
https://help.ubuntu.com/community/UbuntuBackports

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.