GNU sort
is a ubiquitous utility, used in GNU/Linux, FreeBSD, and Mac OS X.
While being very mature and flexible, it is quite tricky to use,
due to backwards compatibility and many options.
Consequently it's probably the utility most questioned on the main coreutils mailing list.
Although the caveats are well documented, the documentation is necessarily long and complicated.
So to help users more directly we've added the --debug option to
give helpful warnings and annotation of input to the user.
New warnings are added related to the handling of thousands grouping characters, decimal points, and sign characters. For example:
There are 3 types of output from --debug. Info, warnings and key annotations.
Info
The only info currently reported is the locale that is being used to sort, which is a common cause of confusion for users.$ sort --debug /dev/null sort: using `en_US.UTF-8' sorting rules $ LC_ALL=C sort --debug /dev/null sort: using simple byte comparison $ LC_ALL=en_US.missing sort --debug /dev/null sort: using simple byte comparison
Warnings
Here is a contrived example that shows all of the warnings currently reported.$ sort --debug -rb -k1n +2.2 -2b /dev/null sort: using `en_US.UTF-8' sorting rules sort: key 1 is numeric and spans multiple fields sort: obsolescent key `+2 -2' used; consider `-k 3,2' instead sort: key 2 has zero width and will be ignored sort: leading blanks are significant in key 2; consider also specifying `b' sort: option `-b' is ignored sort: option `-r' only applies to last-resort comparisonTaking a more realistic example in isolation
$ sort --debug -s -r -k1,1n /dev/null sort: using `en_US.UTF-8' sorting rules sort: option `-r' is ignored[Update Oct 2021:
New warnings are added related to the handling of thousands grouping characters, decimal points, and sign characters. For example:
$ printf '0,9\n1,a\n' | sort -nk1 --debug -t, -s sort: key 1 is numeric and spans multiple fields sort: field separator ‘,’ is treated as a group separator in numbers 1,a _ 0,9 ___For more examples and details see the commit. ]
Key annotations
Key annotations are generally useful to confirm the extents of the keys being matched, especially when one needs to define character offsets.In this example we see that there can be 2 comparisons per line, the last resort one (because we didn't specify -s) serves to mess up the sort in this example
$ printf "1.1 four\n1.1 five\n" | sort -n --debug 2>/dev/null 1.1 five ___ ________ 1.1 four ___ ________Here we can see how TAB characters are distinguished with '>', and the complicated number matching of the '-g' option.
printf "0x3e4\n1.1\n +2" | cat -n | sort -gs -k2,2 --debug 2>/dev/null 2>1.1 ___ 3> +2 __ 1>0x3e4 _____Here we see how leading blanks are significant in the comparison fields, which in this case can be used to efficiently sort right aligned numbers. Note the significance of LANG=C here to avoid issues with blanks being ignored in the comparison in some locales.
printf '...%6s\n' 9 10 | LANG=C sort -s -k2,2 --debug 2>/dev/null ... 9 ______ ... 10 ______
© May 17 2010