This is a comparison of the support various languages and tools
have for reading and processing arbitrarily long lines of text.
All source code used for the test is available, and I would appreciate results others get from different systems.
Each program loops reading a line from stdin and outputting immediately to stdout, and was tested/timed using:time ./test_prog <test_data.txt >/dev/nullThere were 2 test files each containing 354371 lines of text. The short lines (/usr/share/dict/words) average line length was 8 characters (excluding the LF) and long lines (/usr/share/doc/*) where the average line length was 37 chars. Many runs were done and the average taken so that no disk access was involved etc.
--------------------------------------------------------------------------------------------- language/tool short lines (~8 chars) long lines (~37 chars) --------------------------------------------------------------------------------------------- C (getline) 0.204s (0.200s/0.010s) 0.305s (0.260s/0.040s) C (linebuffer_getc_unlocked) 0.156s (0.140s/0.020s) 0.363s (0.300s/0.060s) C (linebuffer_fgets_NoNulls_unlocked) 0.264s (0.240s/0.020s) 0.441s (0.370s/0.060s) C (linebuffer_fgets_unlocked) 0.306s (0.280s/0.020s) 0.509s (0.440s/0.060s) C (linebuffer_fgets_NoNulls) 0.315s (0.310s/0.010s) 0.486s (0.430s/0.050s) C (linebuffer_fgets) 0.374s (0.360s/0.010s) 0.573s (0.520s/0.050s) C (linebuffer_getc) 0.424s (0.410s/0.010s) 1.395s (1.340s/0.050s) grep (GNU 2.4.2) 0.240s (0.230s/0.010s) 0.299s (0.260s/0.040s) sed (GNU 3.02) 0.270s (0.260s/0.010s) 0.361s (0.350s/0.010s) perl (5.6.0) 0.745s (0.730s/0.020s) 1.664s (1.620s/0.040s) awk 0.941s (0.910s/0.030s) 1.108s (1.070s/0.040s) C++ 1.251s (1.140s/0.110s) 3.944s (3.840s/0.110s) python (2) 3.976s (3.960s/0.010s) 4.147s (4.080s/0.060s) python (1) 5.731s (5.720s/0.010s) 9.431s (9.380s/0.050s) shell (bash 2.05.8) 25.428s (22.040s/3.080s) 36.567s (31.780s/4.500s) tcl (8.3.5)* cat** 0.015s (0.000s/0.010s) 0.053s (0.000s/0.050s) --------------------------------------------------------------------------------------------- * Didn't do performance testing on tcl but it was about 250% slower than python **cat actually does essentially the same since we're not actually changing the data :-) however it does illustrate the advantages of using the optimum device block size and bypassing stdio buffering etc. Test host info: --------------- System 00:00.0 Host bridge: Silicon Integrated Systems [SiS] 630 Host (rev 31) CPU vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Celeron (Coppermine) stepping : 10 cpu MHz : 847.251 cache size : 128 KB kernel Linux 2.4.13 Thu Nov 15 18:06:58 GMT 2001 i686 GLIBC VERSION="2.2.4" RELEASE="stable" HOST="i386-redhat-linux-gnu" CC='gcc' CCVERSION='2.96 20000731 (Red Hat Linux 7.1 2.96-98)' CFLAGS="-march=i386 -D__USE_STRING_INLINES -fstrict-aliasing -freorder-blocks -DNDEBUG=1 -g -O3" gcc CC='gcc' CCVERSION='2.96 20000731 (Red Hat Linux 7.1 2.96-98)' CFLAGS="-O9"
© Mar 29 2006