coreutils inbox - Dec 2014

This status update (subscribe via RSS) comes about 12 months after the last one, and coincides loosely with the upcoming release 8.24.
Note also the bug tracker with stats which records additions, feature requests and issues.

Rejected ideas

Some of the hardest work on coreutils is knowing what to reject and providing appropriate justification to the contributors. The contributions below all came since the last update and while good ideas, they were not included for various reasons detailed on the mailing list. These are included in a full list of rejected coreutils requests.

ls -z,--zero to NUL terminate entries. Existing tools like find(1) were thought sufficient
wc -q to suppress the file name. Redirecting file to stdin is sufficient
cp,mv --bwlimit to throttle data transfer rates. This is available in rsync, and is better suited to higher level tools
timeout setting a TIMEOUT env var. The use case is unusual and supported with explicitly setting vars with env etc
uniq --check-fields=N to only check N fields. uniq --key would be a more general solution
csplit --output=N to output only the Nth file. This was thought too specialized to support
dd conv=offload to offload copying to various backends. This was thought too specialized to support explicitly
wc --max-chars=N to filter out long lines. Existing filters like awk 'length($0) <= 3' were deemed more appropriate
md5sum --no-filename to only output the checksum. Postprocessing the output was deemed sufficient
rename command (from util-linux). Existing commands can do this, and adjusting for inclusion in coreutils was thought too disruptive
stat --files0-from=FILE. This is only needed for commands needing to process all arguments in a single invocation
stat --digest-type=WORD. It was thought better to use the existing checksum utils and join the file names etc. separately
stat --quoting-style=WORD. Adjustments to --format='%N' were thought more appropriate
dd conv=truncpost. To support filtering files in place. Due to error handling this was not thought useful enough
sort --header to exclude leading lines from the sort. sed, head, etc. were deemed sufficient
testline program to expose bloom filter functionality. It was thought options to existing tools were more appropriate
du --sort to sort by disk usage. This is already supported directly by du -h | sort -h
yes -n to not output a '\n'. yes whatever | tr -d '\n' was thought sufficient
ls --sort=inode. It was thought find ... | sort was more appropriate for this low level functionality
touch --verbose. It was thought that xargs --verbose or (set -x; touch *) was sufficient

Additions

Note you can see the latest changes as they're added in the NEWS file (subscribe via RSS).

chroot --userspec is more efficent and generic doing lookups both inside and outside the chroot
All tools can be built as a single multi-call binary
mv uses a reflink by default to more efficiently rename across BTRFS subvolumes
dd status=progress to update a status line once per second
od --endian to select byte order for the input data
One can now disable ls coloring with env variables indicating terminal capabilities
cp and mv now preserve xattrs when copying symlinks
cp --sparse tries harder to create holes

TODO

These new items were identified since the last update.

Implement more general detection and display of extended attributes for ls and stat
Support fractional SI suffixes for sort -h and numfmt
Have tac honor the input file offset to support skipping some input
Possibly support specified date input formats to guide parsing of dates
Support alphabetic output from seq
Support multiple fields in numfmt
Possibly support 'M' (intmax) and 'LL' (long long) size modifiers with od
Possibly support more integration of uniq and sort
Make head reponsive to `stdbuf -iO`
Possibly add unicode point identification in od output
Possibly have a configurable header/content lines per page in pr
Possibly support relative symlinks with cp -s, using logic from ln --relative
Support cell/character/bytes in printf width specifiers
Indicate capabilities in "extra perms" chan in ls
Support context sensitive relative dates to avoid gotchas with determining previous month etc.
Support a stat format char to output symlink names
Implement `tail -r` as equivalent to `tac` to sync up with POSIX proposal

These items mentioned in the last update are not done yet.

Integrate fallocate(2) into cp and mv. The interface hasn't been improved, so we'll just use it as is
posix_fallocate() is still not used due to its dependence on fallocate(2)
libunistring is now available in debian and fedora and we're about to start using it in coreutils
sort --head or sort --range to more efficiently output a subset of the input
add a NSA/DoD verify function to shred
Handle ACLs by not using umask
Automatically use more CPU cache efficient buffer sizes in sort
Possibly integrate the threaded external sort patch
Add an inplace contrib/ script (or command) to robustly edit files in-place
Possibly add OCFS2 support to cp --reflink
cut --blank-separated
support SEEK_DATA/SEEK_HOLE in ZFS and elsewhere to efficiently process binary sparse files
wc -b -M to output frequencies of characters
multiarch support in stdbuf
rename might be a candidate for coreutils
--noatime support to various recursive traversal tools
rm --no-traverse-mount-points which would be especially useful with bind mounts
stat(1) and ls(1) support for birth time. Dependent on xstat() being provided by the kernel
fmt -w should not have such low limits
cp -u should be restartable. Currently may leave partial files in dest, or wrong files in the presence of hard links
split --confirm-create. To allow one to insert a new disk or whatever. There might be a way to do this externally?
chmod -hHLP should be supported (like BSD)
support tee --write-error={[cont],ignore,exit}
support sleep,timeout --date="..." to specify absolute times independent of suspend/resume etc.
expand seq fast path to more cases like specifying hex, adding arbitrary integers and subtraction etc.
Possibly use sendfile in cp. Not under consideration yet until benchmarked and the code made portable enough
Add an sha-3 util. Work already started on such a GPL util elsewhere
Possibly s/--first-only/--initial/ in unexpand, to be less ambiguous and match expand
mkdir -p should be concurrent. I.E. dirs created separately while its running shouldn't cause an error
Possibly adjust sort merging to minimize data I/O. Also unexpectedly slow merging was reported
Sort threading issues were reported on Solaris 10 and RHEL 5.8
Integrate multi-byte support for expand and unexpand using libunistring and a common core for the two utils
Possibly support tail -f --timestamp to prepend timestamps to output
Integrate support for ISO 8601 basic format as input to date
Possibly indicate "capabilities" with the ^ character instead of colors
Integrate {join,uniq} --key to provide sort(1) like field processing in join(1) and uniq(1)
Integrate tests adjustment to use rngtest
sort might use vfork() or posix_spawn() for more efficient memory handling
csplit -i to support specifying the start number when naming split files
Possibly add shuf --all-permutations to consume all input and output all permutations
cut, numfmt, sort and join should be consistent and all allow overriding the input separator option
install --preserve to give more control over copied attributes and better symmetry with cp
Improve performance of shred --repeat by using reservoir-sampling with replacement
Possibly allow stty to set arbitrary speeds
Possibly implement deterministic handling of hardlinks by cp -R
Use copyfile() when available to support efficient remote copies and file system specific attributes
Perhaps allow specifying a --random-seed option to sort, shuf, shred to seed the PRNG