Ubuntu

language pack po files drop cflag comment which causes segfaults in e. g. 'dd'

Reported by Léa GRIS on 2006-04-30
236
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Translations
Medium
Unassigned
gettext
Incomplete
Undecided
Unassigned
langpack-o-matic
Critical
Martin Pitt
coreutils (Ubuntu)
Medium
Unassigned
gettext (Ubuntu)
Medium
Unassigned
language-pack-es-base (Ubuntu)
Medium
Martin Pitt
language-pack-fr-base (Ubuntu)
Medium
Unassigned
language-pack-pl-base (Ubuntu)
Medium
Unassigned
language-pack-ru-base (Ubuntu)
Medium
Unassigned

Bug Description

Bug candidate:
Package: language-pack-fr-base
Versions:
1:6.06+20060427

Reproducing the bug:
lea@coincoin:~$ dd if=/dev/zero of=testfile bs=16k count=1000
1000+0 records in
1000+0 records out
Erreur de segmentation
(aka: segfault)

lea@coincoin:~$ export LANG=en
lea@coincoin:~$ dd if=/dev/zero of=testfile bs=16k count=1000
1000+0 records in
1000+0 records out
16384000 bytes (16 MB) copied, 0.046938 seconds, 349 MB/s

Possible source of bug:
flawed translation of string "%n bytes (%n) copied, %f seconds, %n MB/s"

Laurent Bigonville (bigon) wrote :

I can confirm this

Changed in language-pack-fr-base:
status: Unconfirmed → Confirmed
Ilya Petrov (ilya-muromec) wrote :

also segfaults on ru_RU and ru_UA

I had the same issue with a auto-extract .bin shell script from java.sun.com because I'm using a french locale too. Switching to english as reported works great.

Laurent Bigonville (bigon) wrote :

Seems to be working with french locales now

Laurent Bigonville (bigon) wrote :

Is the problem solved with russian locales?

Changed in coreutils:
status: Unconfirmed → Needs Info
Ilya Petrov (ilya-muromec) wrote :

no. now it segfaults ( 1:6.06+20060511 )

Changed in language-pack-fr-base:
status: Confirmed → Fix Released
Changed in language-pack-ru-base:
status: Unconfirmed → Confirmed
Laurent Bigonville (bigon) wrote :

oups to fast... still segfault

Changed in language-pack-fr-base:
status: Fix Released → Confirmed
Laurent Bigonville (bigon) wrote :

I've tried with the traduction of the 5.95 version and it's work...

Xavier Claessens (zdra) wrote :

Same problem here With FR locales:

zdra@zdra-desktop:~$ dd if=/dev/zero of=test.img count=1
1+0 records in
1+0 records out
Erreur de segmentation
zdra@zdra-desktop:~$

Laurent Bigonville (bigon) wrote :

If I add "#, c-format" before

msgid "1 byte (1 B) copied"
msgid_plural "%<PRIuMAX> bytes (%s) copied"
msgstr[0] "1 octet (1 o) copié"
msgstr[1] "%<PRIuMAX> octets (%s) copiés"

it works...

Xavier Claessens (zdra) wrote :

Here is a call stack of the crash if needed:

#0 0xb7f072a3 in strlen () from /lib/tls/i686/cmov/libc.so.6
#1 0xb7edb2e4 in vfprintf () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7ed7d7c in cuserid () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ed7fbb in vfprintf () from /lib/tls/i686/cmov/libc.so.6
#4 0xb7ee06af in fprintf () from /lib/tls/i686/cmov/libc.so.6
#5 0x080497b2 in print_stats () at dd.c:553
#6 0x0804b3fa in main (argc=4, argv=0xbfaf1034) at dd.c:600

Xavier Claessens (zdra) wrote :

A more complete trace with debug infos:

#0 0xb7e51eb5 in _IO_vfprintf (s=0xbfe51000, format=0xb7c9c508 "%<PRIuMAX> octets (%s) copiés", ap=0xbfe536c8 "") at vfprintf.c:1819
#1 0xb7e4eaf4 in buffered_vfprintf (s=0xb7f27b20, format=0xb7c9c508 "%<PRIuMAX> octets (%s) copiés", args=0xbfe536c8 "") at vfprintf.c:2123
#2 0xb7e4ecb6 in _IO_vfprintf (s=0xb7f27b20, format=0xb7c9c508 "%<PRIuMAX> octets (%s) copiés", ap=0xbfe536c8 "") at vfprintf.c:1246
#3 0xb7e576eb in *__GI_fprintf (stream=0x0, format=0x0) at fprintf.c:32
#4 0x080497b2 in print_stats () at dd.c:553
#5 0x0804b3fa in main (argc=4, argv=0xbfe53b84) at dd.c:600

Xavier Claessens (zdra) wrote :

For me the bug comes from gettext or coreutils. As you see the %<PRIuMAX> is passed to fprintf. With LANG=C this is replaced by %llu like that fprintf reconize that he has to replace that with an integer, %<PRIuMAX> isn't reconized so the integer is passed to the %s and that causes a segfault.

Also segfaults with Dutch (nl_NL) locale, but works fine when set to English.

LeoRochael (leorochael) wrote :

Also segfaults with pt_BR locale

Still the same for pl_PL.UTF-8.
All packages up-to-date with Ubuntu Dapper.

Simon Law (sfllaw) on 2006-07-10
Changed in language-pack-pl-base:
importance: Untriaged → Medium
status: Unconfirmed → Confirmed
Laurent Bigonville (bigon) wrote :

Still occurs in edgy with french translations

this system I upgraded from breezy to dapper via dist-upgrade, so that may very well be the source of the problem.

#> echo $LANG
de_DE.UTF-8
#> echo hallo > blubb
#> dd if=blubb of=blah
0+1 records in
0+1 records out
Segmentation fault

last lines of an strace:

open("blubb", O_RDONLY|O_LARGEFILE) = 0
_llseek(0, 0, [0], SEEK_CUR) = 0
close(1) = 0
open("blah", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 1
rt_sigaction(SIGUSR1, NULL, {SIG_DFL}, 8) = 0
rt_sigaction(SIGINT, NULL, {SIG_DFL}, 8) = 0
rt_sigaction(SIGUSR1, {0x8049a2c, [INT USR1], 0}, NULL, 8) = 0
rt_sigaction(SIGINT, {0x8049a1f, [INT USR1], SA_NOMASK|SA_ONESHOT}, NULL, 8) = 0
clock_gettime(CLOCK_MONOTONIC, {162303, 300780064}) = 0
read(0, "hallo\n", 512) = 6
read(0, "", 512) = 0
write(1, "hallo\n", 6) = 6
close(0) = 0
close(1) = 0
clock_gettime(CLOCK_MONOTONIC, {162303, 322651064}) = 0
open("/usr/share/locale/de_DE/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/de/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/de_DE/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/de/LC_MESSAGES/coreutils.mo", O_RDONLY) = 0
fstat64(0, {st_mode=S_IFREG|0644, st_size=193235, ...}) = 0
mmap2(NULL, 193235, PROT_READ, MAP_PRIVATE, 0, 0) = 0xb7caf000
close(0) = 0
open("/usr/share/locale/en_GB/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/en_GB/LC_MESSAGES/coreutils.mo", O_RDONLY) = 0
fstat64(0, {st_mode=S_IFREG|0644, st_size=239094, ...}) = 0
mmap2(NULL, 239094, PROT_READ, MAP_PRIVATE, 0, 0) = 0xb7c74000
close(0) = 0
open("/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "0+1 records in\n0+1 records out\n", 310+1 records in
0+1 records out
) = 31
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

M0dusFRee (josebal) wrote :

Also Segmentation fault with es_ES.UTF-8 locale.

xxx@xxx:~$ echo $LANG
es_ES.UTF-8
xxx@xxx:~$ dd if=/dev/zero of=temp_1 bs=1k count=1
1+0 records in
1+0 records out
Fallo de segmentación

Alexander Schulze (schulze) wrote :

As Laurent already mentioned, the problem disappears when adding "#, c-format" in the PO format of the file, in front of the problematic entry. I think this is the key to understand what is really going wrong here.

It seems that first running msgunfmt, followed by msgfmt, does not necessarily yield the same MO file again, as one could guess. If C99 format specifications like PRIuMAX are used, msgfmt creates an invalid MO (see below) if "c-format" is not specified explicitly, leading to the crash. In my opinion, msgunfmt should be modified to output "#, c-format" lines for entries that use system-dependent modifiers like PRIuMAX (which should be a rather simple fix), or (better, IMO, as this is the real cause of the problem) msgfmt's heuristic to detect C printf strings should be modified not to require "#, c-format" lines in these cases. As I understand the manual, "#, c-format" should only indicate that the programs should do *additional checks*, but should not *change the behaviour* that dramatically if the strings are valid printf formats.

For me, it seems to be a bug in the gettext package, and it seems to be still without fix in the newest available version. Forwarding the bug report upstream is recommended.

Note that this problem may also affect other package's MO files and may lead to crashes in many other programs. Perhaps the seriousness of the bug should be changed, especially as there is a simple workaround that works quite well here: Just do a msgunfmt-msgfmt cycle on each MO file, but insert a simple (sed?) script in between that prefixes all records with "#, c-format" lines. Until now I have seen no negative side effects, but the problems are gone. This workaround could be applied to all MO files in all language packs before a fixed gettext becomes available, and requires only a minimal amount of resources (CPU time etc.).

Comparing Ubuntu's MO files with those of Debian stable (or SuSE), I find that the binary content of the MOs in these distributions corresponds to that which I get when running msgfmt with the "#, c-format" fixed POs (i.e., %<PRIuMAX> does not appear inside the strings, at least not in the translations, but is replaced by a single % instead, and PRIuMAX appears as a system-dependent string of its own, which is not the case for Ubuntu's MOs). These distributions seem to generate the POs by xgettext or similar, so that all c-format hints are available and used. I'm not quite sure what the translation teams of Ubuntu use, but that may make the difference why it is hitting just Ubuntu alone (as far as I know and could test here).

Alexander Schulze (schulze) wrote :

The problem in Bug 42264 seems to be related to the fact that gettext has problems identifying C printf strings as such when C99 specifiers like PRIuMAX are present, and requires a "#, c-format" hint in these cases. Note that running msgunfmt on a MO, followed by msgfmt on the intermediate PO, does not yield the original MO any more, if specifiers like PRIuMAX were present *and* the original MO was created using a c-format hint. In my interpretation, this is a bug in the msgfmt/msgunfmt tools, therefore raising the priority by one step to "Needs info".

Changed in gettext:
status: Unconfirmed → Needs Info
Dennis Kaarsemaker (dennis) wrote :

Added launchpad task to make sure launchpad maintainers are aware of this corner case -- the launchpad exporter must be able to get this right too.

Alexander Jones (alex-weej) wrote :

en_GB.UTF-8 crashing, too.

Ryan Lortie (desrt) wrote :

hello. my name is en_CA.UTF-8.

Jamie Lokier (jamie-shareable) wrote :

I confirm this trivial dd command:

    dd if=/dev/zero of=/dev/null bs=1k count=1

crashes with a fresh Ubuntu 6.06 installation off the CD with updates. (i386, 32-bit).

It crashes for LANG=en_GB.UTF-8 and LANG=en_US.UTF-8 but not LANG=C, LANG=en_GB or LANG=en_US.

I found this to be a problem when various build processes (which use dd) for my own programs started crashing...

David Marín (davefx) wrote :

I confirm that it segfaults with es_ES.UTF-8 locale too.

David Marín (davefx) on 2006-08-22
Changed in language-pack-es-base:
status: Unconfirmed → Confirmed

Notice that the dd *did* work, that is, the output file is correct. It's just a segfault in the final message (that can create confusion in callers, yes, but dd works just ok).

Romano Giannetti wrote:
> Notice that the dd *did* work, that is, the output file is correct. It's
> just a segfault in the final message (that can create confusion in
> callers, yes, but dd works just ok).

It breaks build scripts / Makefiles, as the exit code is non-zero.
That's the main problem I've had.

Of course it's possible to workaround by modifying those scripts to
ignore the exit code. But it's just as easy to set LANG=C or (as I
have done) to write a /usr/local/bin/dd script containing

    #!/bin/sh
    export LANG=C
    exec /bin/dd "$@"'

-- Jamie

dd ALSO fails on XUBUNTU

# echo $LANG
en_GB.UTF-8

# dd if=/dev/zero of=/dev/null bs=1k count=1
1+0 records in
1+0 records out
Segmentation fault

Eugene Kravtsoff (ekrava) wrote :

msgunfmt /usr/share/locale-langpack/ru/LC_MESSAGES/coreutils.mo -o coreutils.po
nano coreutils.po

добавляем строку

#, c-format

перед строками

msgid "1 byte (1 B) copied"
msgid_plural "%<PRIuMAX> bytes (%s) copied"
msgstr[0] "скопирован 1 байт (1 B)"
msgstr[1] "скопировано %<PRIuMAX> байта (%s)"
msgstr[2] "скопировано %<PRIuMAX> байт (%s)

sudo msgfmt coreutils.po -o /usr/share/locale-langpack/ru/LC_MESSAGES/coreutils.mo

спасибо Ilya Petrov

I think this IS a critical bug. Low level simple utilities like dd should work flawlessly!

I can confirm this bug using en_US.UTF-8.

Thomas David Baker (bakert) wrote :

Confirmed for en_GB on 6.06.1

$ dd if=/dev/zero of=x bs=1 count=2
2+0 records in
2+0 records out
Segmentation fault

Léa GRIS (lea-gris) wrote :

Such a confirmed crash bug affecting core-utils, should be marked as high priority and be fixed in the current release.

It is now four months old without a fix. Sorry for pointing this out and looking sharp edged. I have no abillity to fix it myself or be of significant help on this.

At least, please raise the priority or criticity.

Wenzhuo Zhang (wenzhuo) wrote :

also segfaults under en_US.UTF-8 locale in 6.06.1:

wenzhuo@thinkpad:~$ dd if=/dev/zero of=/dev/sda bs=1024 count=10
10+0 records in
10+0 records out
Segmentation fault
wenzhuo@thinkpad:~$ echo $LANG
en_US.UTF-8

Martin Pitt (pitti) wrote :

My fault, sorry. msgfmt(msgunfmt()) produces something different than the original since msgunfmt does not restore ', cformat' comments. Will work around that in langpack-o-matic, so that the next langpack updates are fixed.

Changed in rosetta:
assignee: nobody → pitti
status: Unconfirmed → In Progress
Martin Pitt (pitti) wrote :

not a coreutils bug

Changed in coreutils:
status: Needs Info → Rejected
Changed in langpack-o-matic:
importance: Untriaged → Critical
Changed in gettext:
status: Needs Info → Confirmed
Martin Pitt (pitti) wrote :

see above, will be fixed with next langpack update

Changed in language-pack-es-base:
assignee: nobody → pitti
importance: Untriaged → Medium
status: Confirmed → In Progress
Martin Pitt (pitti) wrote :

this bug affects all language packs, thus I only keep the bug open for an example language pack.

Changed in language-pack-fr-base:
status: Confirmed → Rejected
Changed in language-pack-pl-base:
status: Confirmed → Rejected
Changed in language-pack-ru-base:
status: Confirmed → Rejected
Wenzhuo Zhang (wenzhuo) wrote :

I'd suggest release a fix as soon as possible instead of waiting for the next langpack update, because this bug fix alone is much more important than a routine langpack update.

Martin Pitt (pitti) wrote :

Fixed in langpack-o-matic, today's daily packages should be good.

Changed in langpack-o-matic:
status: In Progress → Fix Released
Martin Pitt (pitti) wrote :

Fixed in edgy yesterday with the upload of new langpacks. Keeping langpack task open until dapper is fixed, too.

Martin Pitt (pitti) wrote :

Dapper language pack uploading stalled due to current *-updates embargo.

Changed in language-pack-es-base:
status: In Progress → Fix Committed
AlejandroRiveira (ariveira) wrote :

I can confirm this for spanish (es)

GNU gdb 6.4-debian
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(gdb) run if=/dev/zero of=/dev/null bs=1MB count=4
Starting program: /bin/dd if=/dev/zero of=/dev/null bs=1MB count=4
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread -1478412608 (LWP 7393)]
(no debugging symbols found)
4+0 records in
4+0 records out

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1478412608 (LWP 7393)]
0xa7e912a3 in strlen () from /lib/tls/i686/cmov/libc.so.6

AlejandroRiveira (ariveira) wrote :

I confirm that it segfaults with es_ES.UTF-8 locale too.

GNU gdb 6.4-debian
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(gdb) run if=/dev/zero of=/dev/null bs=1MB count=4
Starting program: /bin/dd if=/dev/zero of=/dev/null bs=1MB count=4
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread -1478412608 (LWP 7393)]
(no debugging symbols found)
4+0 records in
4+0 records out

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1478412608 (LWP 7393)]
0xa7e912a3 in strlen () from /lib/tls/i686/cmov/libc.so.6

Up to date dapper with backports

Martin Pitt (pitti) wrote :

New langpacks landed in dapper-proposed yesterday, testing appreciated. If all goes well, these langpacks will go to -updates next Wednesday.

Is this still a bug in dd, if the language packs can cause it to crash?

On 10/12/06, Martin Pitt <email address hidden> wrote:
> New langpacks landed in dapper-proposed yesterday, testing appreciated.
> If all goes well, these langpacks will go to -updates next Wednesday.
>
> --
> language pack po files drop cflag comment which causes segfaults in e. g. 'dd'
> https://launchpad.net/bugs/42264
>

Martin Pitt (pitti) wrote :

Hi Paul,

PaulSchulz [2006-10-12 9:00 -0000]:
> Is this still a bug in dd, if the language packs can cause it to crash?

No, it's not. The root cause of the bug in the langpack-o-matic
scripts. There is nothing dd itself could do to prevent this crash.

Martin Pitt (pitti) wrote :

Dapper got new language packs today. For the majority of cases this problem is fixed in dapper now. I'll see to fixing it completely in the next round.

Vassilis Pandis (pandisv) wrote :

Similar problem reported in bug 75477, with LANG=pl_PL.UTF-8 .

Martin Pitt (pitti) wrote :

Dapper just got fresh -base packages, so all instances of this should be fixed now.

Changed in language-pack-es-base:
status: Fix Committed → Fix Released
Konrad Materka (kmaterka) wrote :

Thanks, but why this had to wait for fix for almost year?
Anyway, thanks again.

Martin Pitt (pitti) wrote :

Hi Konrad,

Konrad Materka [2007-02-09 11:37 -0000]:
> Thanks, but why this had to wait for fix for almost year?

because probably 99% of the cases were already fixed soon after the
initial report, and I did not get any further reports after that. I
just kept the bug open until I could make sure that *all* cases were
indeed fixed.

Alexander Jones (alex-weej) wrote :

Do we have a test case for this? Is anybody still reporting any crashes here? (Though segfault detection is switched off now which is a bit of a "lalalala can't hear you" kind of approach to fixing bugs, but that's another story...)

What exactly do you mean with "segfault detection is switched off now"?

Alexander Jones (alex-weej) wrote :

I mean during the Dapper pre-release, any segfault was caught and the Launchpad crash reporter would pop up. That doesn't happen anymore, applications simply just explode and disappear without you even noticing most of the time. For me, DD was crashing in an installation script. Now, you wouldn't even notice - the installation would have to pick up on it.

OK. I feared it wouldn't even spit out an error on the command line. So it's "just" back to normal.

Alexander Jones (alex-weej) wrote :

Is this still affecting gettext?

Martin Pitt (pitti) wrote :

Alex,

yes, it does. msgunfmt should restore the "cformat" flag. A msgfmt/msgunfmt cycle should not break the resulting .po file.

Adi Roiban (adiroiban) on 2009-10-31
Changed in ubuntu-translations:
status: New → Triaged
importance: Undecided → Medium
David Planella (dpm) on 2010-02-22
Changed in ubuntu-translations:
status: Triaged → Fix Released

Thank you for posting this bug.

Does this occur in Maverick?

Changed in gettext:
status: New → Incomplete
Changed in gettext (Ubuntu):
status: Confirmed → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers