Testing awk
I’ve been testing wak
against the other awk implementations I’ve been able to obtain. I have recent versions of nnawk
(Kernighan’s One True Awk, the original Unix awk updated), gawk
, mawk
, goawk
, and bbawk
(busybox awk).
As of this writing, the versions are:
nnawk: awk version 20231228 (compiled 2024-01-23)
gawk: GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1 (source 2023-11-02; compiled 2023-11-19)
mawk: mawk 1.3.4 20231102 (compiled 2023-11-16)
goawk: V1.25.0 (compiled 2024-01-23)
bbawk: 2023-12-31 (compiled 2024-01-23)
I’m sure there are plenty of bugs. Brian Kernighan has until recently been maintaining the original Unix awk from the start almost 40 years ago and has a “FIXES” file with over 200 entries from 1987 to 2023, and continuing to the present in a separate file for the “second edition” of One True Awk. If Kernighan has been fixing bugs for 35+ years, I doubt I can make a bug-free awk.
Some “bugs” are incompatible interpretations of awk compared with what other implementations do with certain features. No two versions of awk (original awk, gawk, mawk, goawk, Busybox awk, etc.) agree completely.
Please report bugs to raygard at gmail.com.
Testing strategy
I have used the test files that come with existing awk implementations, plus some I’ve written. The original One True Awk comes with a folder testdir
of about 315 files. Kernighan’s README.TESTS file says there are about 60 small tests p.*
from the first two chapters of The AWK Programming Language (1st ed.) that are basic stuff; about 160 small tests t.*
that are “a random sampling of awk constructions collected over the years. Not organized, but they touch almost everything.”; about 20 tt.*
files that are timing tests.
The testdir
folder has also about 30 T.*
files that are “more systematic tests of specific language features”, but unfortunately these are shell scripts that can test a single awk program to see if it computes correct output as compared with known good data built into the scripts. This makes it difficult to use to compare my implementation against all the others in one pass, but I can run the scripts on wak
separately.
gawk
also comes with a folder of about 1475 files, and most of these are sets of foo.awk
, foo.in
, and foo.ok
files. In each case, the foo.awk
file is run with foo.in
input and the result can be compared with foo.ok
. Some are standalone tests that do not need an input file, so there is sometimes no foo.in
file.
I have a not-very-neat test driver test_awk.py
that I can use to run a batch of tests, such as all t.*
in testdir
, at one time against several awk implementations, and see how they compare. In the case of testdir
’s p.*
and t.*
files, they are intended to use certain input files (test.countries
and test.data
), and the outputs are compared via MD5 hashes. Each unique output is saved for later examination. For the gawk
-style tests, the program can compare the output against the foo.ok
file and give a pass/fail result. If there is a non-zero return code or an exception, that is noted on the test_awk.py
output. The output looks like this:
======= ======= ======= ======= ======= ======= =======
====versions====>>>> gawk nnawk mawk goawk bbawk tbawk muwak
==== ==== ==== ==== ==== ==== ====
[...]
Test delarpm2.awk dd8e2e5 8841567 NNAWK c992867 gawk GOAWK GOAWK
Test dfacheck1.awk 03a19ad 0000000 NNAWK NNAWK gawk !3a19ad TBAWK
ERR: tbawk: awk: file tests/gawktests/dfacheck1.awk line 1: warning: '\<' -- unknown regex escape
ERR: muwak: muwak: file tests/gawktests/dfacheck1.awk line 1: warning: '\<' -- unknown regex escape
Test double1.awk 8c7dbdf 819e6db gawk 351564a d97b6e5 BBAWK BBAWK
Test double2.awk 4dbdb44 4941b67 gawk 7a665a2 acfcfe0 0124355 TBAWK
Test dtdgport.awk a916caa gawk gawk gawk !!00000 gawk gawk
RET: bbawk: 1
ERR: bbawk: awk: tests/gawktests/dtdgport.awk:37: %*x formats are not supported
The hex values are the first 7 digits of the MD5 of the output file. If the output is an empty file, the MD5 is replaced with all zeroes to make it easier to spot. If any stderr output occurs, the first digit is replaced with a ‘!’; if a non-zero return code occurs, the second digit is replaced with ‘!’. In either case, the stderr output and return code are printed. These (non-pass-fail, non-timing) reports always display a hash value of the output for the first column (i.e. the first awk version tested). In subsequent columns, if the hash is different from the first column, that hash is listed; but if hash matches a hash from a previous column then the awk version of that column is listed, and if it differs from the first column it is up-cased.
So for example, for delarpm2.awk
, nnawk
has a different output from gawk
, mawk
matches nnawk
, goawk
has yet another different output, bbawk
matches gawk
, and both tbawk
(toybox awk
– my awk for toybox) and muwak
(my awk compiled with musl
libc) match goawk
. For dfacheck1.awk
, gawk
gave some output, nnawk
produced no output, mawk
and goawk
also produced no output (matching nnawk), bbawk
matched gawk
, tbawk
produced the same output as gawk
but had stderr output, and muwak
matched tbawk
, including having stderr output.
The gawk
tests were originally intended to be run via the supplied Makefile
, and some of them use special gawk
options, environment setup, etc., so that when the foo.awk
file is run by test_awk.py
it may not produce correct foo.ok
output even from gawk
. Because of this, I sifted the output from all the gawk
tests against all the awk versions into several parts and moved the tests into corresponding folders: gawktests/allfail
has tests that fail for all versions, including gawk
; gawktests/allpass
has tests that pass for all versions; gawktests/gawkonly
has tests that pass for gawk
and fail for all others (usually because they use gawk-only features); and gawktests
has all the remaining tests.
I also wrote a shell script and awk script to sift the resulting test output files into several categories. I usually have test results in colums for gawk, nnawk, mawk, goawk, bbawk, my awk within toybox (tbawk), and my awk standalone (may be compiled with ASAN sanitizer, or with musl lib, or some other version). The order is significant because I consider my result golden if it matches both gawk and nnawk, still good if it matches gawk or nawk, less good if it matches (only) mawk, goawk or bbawk. If my awks (last two columns) differ, they go into a set_y
file; that’s usually a result of for (element in array)
iterating the array in random order (as all awks currently do). (An annoyance for testing is that goawk doesn’t usually do it the same way on different runs due to golang’s intentionally random hash behavior that apparently cannot be turned off.) If any “allfail” tests do not all fail, or any “allpass” tests do not all pass, or any “gawkonly” tests do not pass only for gawk, they go into separate files. If a test is pass/fail, then I put tests that my awk fails into a set_fail
file; if it passes and both gawk and nawk pass, it goes into an set_ok
file; if it matches gawk or nawk it goes into a set_1
or set_2
file, otherwise it goes into a general set_pass
file. If it’s not pass/fail, then if my result matches gawk and nawk, it goes into a set_ok
file, else if it matches gawk it goes into the set_1
file; else if matches nawk it goes into the set_2
file, else if it matches mawk, goawk, or bbawk it goes into a set_3
, set_4
, or set_5
file respectively. If it doesn’t fit into any of those buckets, then it doesn’t match any other implementation, and goes into a set_x
file.
So the set_x
file needs the closest scrutiny, as those are most likely bugs in my implementation. Currently, I have 32 tests in that bucket out of 1180 tests run. Here is an approximate breakdown of the current test results:
set_1 | 86 |
set_2 | 158 |
set_3 | 52 |
set_4 | 8 |
set_5 | 13 |
set_badfail | 2 |
set_badpass | 2 |
set_fail | 48 |
set_ok | 735 |
set_pass | 13 |
set_x | 32 |
set_y | 31 |