sort's -u is Failing to Check all -k fields for Uniqueness.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
coreutils (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: coreutils
Tested with sort 5.2.1 from coreutils 5.2.1-2ubuntu0 and sort 5.93 from
5.93-5ubuntu4.
sort's -u (unique) option isn't working as expected and the info (spit!)
documentation doesn't match its behaviour either so I'd expect one or
the other to change.
Given:
$ echo 1/3,1/2,1/1,2/1 | tr , \\012 | sort -t / -k 1,2
1/1
1/2
1/3
2/1
OK. Add -n (numeric):
$ echo 1/3,1/2,1/1,2/1 | tr , \\012 | sort -n -t / -k 1,2
1/1
1/2
1/3
2/1
Still OK. Add -u (unique):
$ echo 1/3,1/2,1/1,2/1 | tr , \\012 | sort -nu -t / -k 1,2
1/3
2/1
Despite sorting on fields 1 to 2 inclusive the unqiueness has been
judged on just field 1. At least that's my guess at what's happening,
backed up by:
$ echo 1/3,1/2,1/1,2/1 | tr , \\012 | sort -u -t / -k 1,1
1/3
2/1
and:
$ echo 1/3,1/2,1/1,2/1 | tr , \\012 | sort -u -t / -k 2,2
1/1
1/2
1/3
I'd expect that if sorting on N fields, lines omitted due to -u have to
match the line output in all N fields. The manual doesn't suggest
anything different:
`-u'
`--unique'
Normally, output only the first of a sequence of lines that
compare equal. For the `--check' (`-c') option, check that no
pair of consecutive lines compares equal.
The lines clearly don't compare equal since without -u it reverses the
order of the first three lines of input. Consequently, all three should
be output with -u. The POSIX spec. seems to support the info
documentation:
http://
At the very least this seems to be a documentation fix but I believe the
behaviour is wrong. The impact of it can be serious, i.e. I'm in the
middle of sorting a list of files to backup!
I think I've given sufficient detail to re-create the issue so it warrants
further investigation. Waiting for someone else to hit the same problem
and find this might mean it never gets confirmed otherwise.