sort's -u is Failing to Check all -k fields for Uniqueness.

Bug #56891 reported by Ralph Corderoy
2
Affects Status Importance Assigned to Milestone
coreutils (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: coreutils

Tested with sort 5.2.1 from coreutils 5.2.1-2ubuntu0 and sort 5.93 from
5.93-5ubuntu4.

sort's -u (unique) option isn't working as expected and the info (spit!)
documentation doesn't match its behaviour either so I'd expect one or
the other to change.

Given:

    $ echo 1/3,1/2,1/1,2/1 | tr , \\012 | sort -t / -k 1,2
    1/1
    1/2
    1/3
    2/1

OK. Add -n (numeric):

    $ echo 1/3,1/2,1/1,2/1 | tr , \\012 | sort -n -t / -k 1,2
    1/1
    1/2
    1/3
    2/1

Still OK. Add -u (unique):

    $ echo 1/3,1/2,1/1,2/1 | tr , \\012 | sort -nu -t / -k 1,2
    1/3
    2/1

Despite sorting on fields 1 to 2 inclusive the unqiueness has been
judged on just field 1. At least that's my guess at what's happening,
backed up by:

    $ echo 1/3,1/2,1/1,2/1 | tr , \\012 | sort -u -t / -k 1,1
    1/3
    2/1

and:

    $ echo 1/3,1/2,1/1,2/1 | tr , \\012 | sort -u -t / -k 2,2
    1/1
    1/2
    1/3

I'd expect that if sorting on N fields, lines omitted due to -u have to
match the line output in all N fields. The manual doesn't suggest
anything different:

    `-u'
    `--unique'
        Normally, output only the first of a sequence of lines that
        compare equal. For the `--check' (`-c') option, check that no
        pair of consecutive lines compares equal.

The lines clearly don't compare equal since without -u it reverses the
order of the first three lines of input. Consequently, all three should
be output with -u. The POSIX spec. seems to support the info
documentation:
http://www.opengroup.org/onlinepubs/000095399/utilities/sort.html

At the very least this seems to be a documentation fix but I believe the
behaviour is wrong. The impact of it can be serious, i.e. I'm in the
middle of sorting a list of files to backup!

Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

I think I've given sufficient detail to re-create the issue so it warrants
further investigation. Waiting for someone else to hit the same problem
and find this might mean it never gets confirmed otherwise.

Changed in coreutils:
status: Unconfirmed → Confirmed
Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

Sorry, this isn't a bug. -k1,$n doesn't sort on the first $n fields but on the
numeric prefix of the first $n fields, i.e. the `/' tab character separating
fields 1 and 2 stops the comparison.

Changed in coreutils:
status: Confirmed → Rejected
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.