The list of languages does not show some available languages consistently

Bug #710148 reported by David Planella
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Translations
Fix Released
High
Unassigned
language-selector (Ubuntu)
Fix Released
Undecided
Gunnar Hjalmarsson

Bug Description

Binary package hint: language-selector

As part of the fix for bug 693337 I've noticed that the detection of available translations to be shown in the list is not correct at least in the Spanish case.

For example, I've got Spanish translations installed (but not used) in two systems. I've observed that:

* In one system "Spanish (Mexico)" and "Spanish (Puerto Rico)" are shown
* In the other system only "Spanish (Puerto Rico)" is shown
* In none of them is the main "Spanish" (or "Spanish (Spain)") translation shown for selection

Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

Thanks for your early feedback!

language-selector scans the /usr/share/locale-langpack and
/usr/share/locale directories. If all Spanish translations reside under
'es', language-selector shows the country independent "Spanish" item
only. If country specific Spanish translations are available (i.e.
'es_XX' directories exist), the country independent "Spanish" item is
_not_ shown. The issue reported in bug 700213 is the reason for 'hiding'
the "Spanish" item in the latter case.

Does that possibly explain the behavior you describe, David?

Another thing is whether the design logic makes sense. I think I took
for granted that in cases when country specific translations are
available, the main dialect ('es_ES' in this case) is one of them.
Apparently that assumption was a mistake.

To solve this bug, I think that the code should be modified, so that in
cases when country specific items are shown, the main dialect ('es_ES'
in this case) shall always be included, irrespective of whether there
are any translations under 'es_ES'.

Changed in language-selector (Ubuntu):
assignee: nobody → Gunnar Hjalmarsson (gunnarhj)
status: New → In Progress
Revision history for this message
David Planella (dpm) wrote :

Hi Gunnar,

As commented on bug 700213, I cannot reproduce it.

I'm not sure how you can detect the main dialect (afaik, there is no place where it says that es_ES or de_DE is the main one), so I'd say we should show all translations present, even if they don't have a language code.

In this particular case, we'd show:

es
es_MX
es_PR

Since otherwise, in the current form, users are being forced to use "es_MX" or "es_PR", as "es" alone is not shown.

Thanks for your work on this.

Changed in ubuntu-translations:
status: New → Triaged
importance: Undecided → High
Revision history for this message
David Planella (dpm) wrote :

Sorry I wrote too quick. I meant "if they don't have a *country* code" in my last comment.

Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

On 2011-01-31 12:02, David Planella wrote:
> I'm not sure how you can detect the main dialect (afaik, there is no
> place where it says that es_ES or de_DE is the main one),

Good point. Considering the para in the gettext docs I quoted in bug
700213
, I first thought that the gettext package includes such info, but
I failed to find it. Instead I created a 'home made' file which maps
certain languages with their "main dialects":
/usr/share/language-selector/data/main-countries

I created it for a slightly different purpose (setting LC_MESSAGES), but
this bug report made me realize that the file may be useful also when
generating lists of language options.

> so I'd say we should show all translations present, even if they
> don't have a country code.
>
> In this particular case, we'd show:
>
> es
> es_MX
> es_PR

In addition to the issue in bug 700213 (assuming I'm right...), there
are more subjective reasons why I personally would prefer a slightly
different solution. What you suggest implies that the UI would sometimes
show both an 'es' and an 'es_ES' option, which may be confusing to
users, and hence should be avoided.

Instead, as soon as there are translations in more than one dialect of a
language available, I'd prefer to include the country in all the options
for that language. So in this case I think we should show:

es_ES
es_MX
es_PR

Please note that all those options will make gettext look for
translations under .../es unless found under respective country specific
directory.

Only when there are no translations in other directories but
/usr/share/locale/es and /usr/share/locale-langpack/es, showing just
'es' is preferable IMO.

> Since otherwise, in the current form, users are being forced to use
> "es_MX" or "es_PR", as "es" alone is not shown.

Indeed. A solution to this bug does require a modification of the code;
great that you catched the issue so fast.

Since there is a gdm merge proposal with similar code changes in
pipeline, I modified it in accordance with the model I'm arguing for
above. Please feel free to install the development gdm package for Natty
to check out its behavior.
https://launchpad.net/~gunnarhj/+archive/language-menus

Revision history for this message
David Planella (dpm) wrote : Re: [Bug 710148] Re: The list of languages does not show some available languages consistently
Download full text (3.7 KiB)

El dt 01 de 02 de 2011 a les 07:28 +0000, en/na Gunnar Hjalmarsson va
escriure:
> On 2011-01-31 12:02, David Planella wrote:
> > I'm not sure how you can detect the main dialect (afaik, there is no
> > place where it says that es_ES or de_DE is the main one),
>
> Good point. Considering the para in the gettext docs I quoted in bug
> 700213, I first thought that the gettext package includes such info, but
> I failed to find it. Instead I created a 'home made' file which maps
> certain languages with their "main dialects":
> /usr/share/language-selector/data/main-countries
>

There is no way a list like that can be kept manually in such a file
withouth in-depth knowledge of all languages represented. If this
information were possible to accurately be defined, it should live in
the iso-codes package, which already contains a lot of information about
languages and countries, along with their translations.

Just look at the map of languages and countries:
http://www.ethnologue.com/country_index.asp

What would you do in the case of English for example? What would be the
main country, the UK or US? Or Persian, should we take Iran or
Afghanistan?

> I created it for a slightly different purpose (setting LC_MESSAGES), but
> this bug report made me realize that the file may be useful also when
> generating lists of language options.
>
> > so I'd say we should show all translations present, even if they
> > don't have a country code.
> >
> > In this particular case, we'd show:
> >
> > es
> > es_MX
> > es_PR
>
> In addition to the issue in bug 700213 (assuming I'm right...), there
> are more subjective reasons why I personally would prefer a slightly
> different solution. What you suggest implies that the UI would sometimes
> show both an 'es' and an 'es_ES' option, which may be confusing to
> users, and hence should be avoided.
>

There is no es_ES language, neither in /usr/share/locale nor
in /usr/share/locale-langpack, although I'm not sure this can be
extrapolated to all languages.

However, what I see is that there is always at least an 'll' code, i.e.
there is no situation where there are only 'll_CC' codes for a language.

In any case, it's a tricky problem. Just a couple of cases to illustrate
it:

zh <- I don't know what that is
zh_CN <- Simplified Chinese
zh_TW <- Traditional Chinese, a different locale

gl <- Galician
gl_ES <- Galician, as spoken in Spain

es <- Spanish, as spoken in Spain
(es_ES) <- Non-existent
es_MX <- Spanish, as spoken in Mexico

> Instead, as soon as there are translations in more than one dialect of a
> language available, I'd prefer to include the country in all the options
> for that language. So in this case I think we should show:
>
> es_ES
> es_MX
> es_PR
>
> Please note that all those options will make gettext look for
> translations under .../es unless found under respective country specific
> directory.
>
> Only when there are no translations in other directories but
> /usr/share/locale/es and /usr/share/locale-langpack/es, showing just
> 'es' is preferable IMO.
>
> > Since otherwise, in the current form, users are being forced to use
> > "es_MX" or "es_PR", as "es" alone is not shown.
>
> Indeed. ...

Read more...

Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :
Download full text (3.3 KiB)

Hi David,
You make a few points, that would have been very valid if we were to
loudly state the "main dialect" for all languages in the world. But we
aren't. The file I pointed at is aimed at only being used behind the
scenes in order to help

1. prevent confusion and buggy behavior in some cases with respect to
   the locale name assigned to LC_MESSAGES, and

2. improve the accuracy of the lists of language options in
   language-selector and GDM.

The aspects of accuracy I have in mind are:

- Avoid to show e.g. both 'de' and 'de_DE' or both 'es' and 'es_ES',
  since users would consider them being duplicates of effectively the
  same option.

- "Spanish (Spain)" is a clearer label than just "Spanish" for an option
  that means "Spanish, as spoken in Spain" as opposed to e.g. "Spanish,
  as spoken in Mexico".

- Prevent the risk due to bug 700213 that certain msgid translations
  can't be easily accessed.

The map file isn't perfect; it will never be. I still believe it will
serve its intended purposes for 98% or so of the Ubuntu users, without
messing it up for the other 2%. And if/when people report related bugs,
it will typically be easy to fix them with small code changes, since
there is a tool in place.

On 2011-02-01 09:34, David Planella wrote:
> What would you do in the case of English for example? What would be the
> main country, the UK or US?

The UK, of course. Or would you consider Mexico to be the main country
of Spanish because of its large population? ;-)

Seriously, I'm treating English as a special case, since 'en' is always
present in the LANGUAGE list by design (as the last item). For that
reason, 'en' should always be included in the UI lists of language options.

If you look at /usr/share/language-selector/data/main-countries, you see
the intentionally vague expression "main or origin country". For the
purpose of determining the LC_MESSAGES locale name, I included 'en' =>
'en_GB'. The Americans won't likely object to the claim that English
originated in England, UK...

> There is no es_ES language, neither in /usr/share/locale nor
> in /usr/share/locale-langpack,

If you take the universe packages into account, there may well be
translations in /usr/share/locale/es_ES. I have a package installed with
translations in all of 'es_AR', 'es_CO', 'es_CR', 'es_ES' and 'es_MX'.

Initially I didn't think of the impact of universe, but it's now taken
into consideration due to the discussion at bug 693337.

> However, what I see is that there is always at least an 'll' code, i.e.
> there is no situation where there are only 'll_CC' codes for a language.

I know of one (or rather two): Chinese. For that reason, only 'zh'
should never be included in the lists of language options.

> In any case, it's a tricky problem.

Yes it is, and I for one appreciate this conversation, through which we
may identify and consider some of the not so obvious pitfalls. Needless
to say, your expertise with respect to language related matters is of
great value to this design discussion.

Attached please find the file "language-options" with the proposed GDM
code for generating the option list. I don't know to which extent the
code itself is useful to ...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package language-selector - 0.13

---------------
language-selector (0.13) natty; urgency=low

  [ Gunnar Hjalmarsson ]
  * LanguageSelector/gtk/GtkLanguageSelector.py:
    - Ensure that main or origin country is included when country
      specific options for a language are shown (LP: #710148).
    - Do not let an absent translation directory make the program crash
      (LP: #714093).
  * data/LanguageSelector.ui:
    - Shorter label to describe the second tab (LP: #709855).
  * LanguageSelector/macros.py:
    - Use locale names with '.UTF-8' instead of '.utf8' when setting
      LC_* or LANG environment variables (LP: #666565, #700619).
      Thanks to Lauri Tirkkonen for the patch!
 -- Evan Dandrea <email address hidden> Mon, 14 Feb 2011 16:13:04 +0000

Changed in language-selector (Ubuntu):
status: In Progress → Fix Released
Gabor Kelemen (kelemeng)
Changed in ubuntu-translations:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.