case sensitive grep broken with UTF8 in intrepid, breaking scripts

Bug #243717 reported by Fabien Tassin
2
Affects Status Importance Assigned to Milestone
grep (Ubuntu)
Confirmed
High
Unassigned

Bug Description

Binary package hint: grep

grep is badly broken with UTF8 in intrepid.

Intrepid:

$ echo FOOBAR | grep FOO
FOOBAR
$ echo FOOBAR | grep -i FOO
$ echo FOOBAR | grep -i foo
$

the expected behavior is like on Hardy:

$ echo FOOBAR | grep FOO
FOOBAR
$ echo FOOBAR | grep -i FOO
FOOBAR
$ echo FOOBAR | grep -i foo
FOOBAR
$

In both cases, my locale is set to en_US.UTF-8.

Looking at the package, there's debian/patches/66-match_icase.patch showing this:

>>>
This fixes
    echo Y | LC_ALL=en_US.UTF-8 grep -i '[y]'
The expected output is:
    Y

Without this patch, it works on non UTF-8 environment, but fails on UTF-8
environment.

The definition of RE_ICASE comes from the glibc (/usr/include/regex.h)

Maybe lib/posix/regex.h should be removed to enforce the usage of the
glibc's regex.h
<<<

$ echo Y | LC_ALL=en_US.UTF-8 grep -i '[y]'
$ echo Y | LC_ALL=C grep -i '[y]'
Y

So it seems it's indeed an UTF8 issue and that the patch is no longer sufficient.

Revision history for this message
John Vivirito (gnomefreak) wrote :

Changed importantce to high since we havent had a confirmed issue in ubuntu scripts. If one is found critical would best describe this bug.
I was also helping Fabien test this over the weekend and i saw the issue as well.

Changed in grep:
importance: Undecided → High
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.