|
Awka - SpecificsThis is a basic question & answer section. If you want to know something that is not described here, feel free to ask me (andrewsumner@yahoo.com).
Does Awka faithfully implement AWK?Yes. AWK programs compiled using Awka should behave exactly the same as they would in Gawk, the most popular free AWK interpreter. I also try to eliminate conflicts with the Posix standard for AWK whenever I become aware of them.What standard AWK features are missing?The support in awka-generated executables for -v var=value and var=value on the commandline is not quite the same as other AWKs. At translation time Awka identifies all global variable names used by a script, and these are hard-coded into the C code it produces. Hence if the var specified in this commandline is not one of the variables used in the script, the statement will be ignored and if the -v option is missing, it will be treated as a filename.Please note that var=value without a preceding -v has only been supported since version 0.7.3. What Gawk features are missing?These are features unique to Gawk, that are not part of the Posix AWK definition.The IGNORECASE variable is not supported. The pseudo /dev/xxx process-id functionality is not supported. Only /dev/stderr and /dev/stdout pseudo devices are present in Awka, otherwise the device must actually exist. As from release 3.1.0 Gawk is phasing out support for /dev/pid etc. also, introducing the PROCINFO array instead. Awka does not yet support PROCINFO, though the array name has been reserved for use as a builtin variable. Gawk introduces "conditional specifiers" in printf formats, which allow you to specify which argument is used. Awka does not as yet support this. Gawk's --lint command-line option and LINT variable are not supported. Instead Awka provides its own warning messages using -w. Type awka --help or see the awka manpage for more information on what this does. Gawk's BINMODE variable is not supported. Awka has not been internationalised, and does not have the new TEXTDOMAIN variable that gawk 3.1.0 introduces for this purpose. What additional features does Awka provide?Quite a few, in the form of extra builtin functions.
Please note that eventually most of these will be moved into an extension library, rather than having them present as arbitrary enhancements where they may not be needed. How do I use Awka?The first part is using the translator program awka to convert your AWK programs to C. awka is a command-line program, as are Gawk, Mawk and others. The command-line options are:--c fn Instead of generating a 'main' function, awka will call this function fn. This is used to integrate translated code into a larger C or C++ application. -x awka will translate, then automatically compile and execute your program. It produces the files awka_out.c and awka.out. -t If -x is present, this will remove the .c and executable files after the program has been run. -X This will create the C file awka_out.c and compile the executable awka.out, but not run it. -f file Specifies AWK program files. -- If -x is present, all arguments after this point will be passed to the compiled executable when it is run. -v Prints version information. The command-line options provide a range of ways in which Awka may be used, from output of C code to stdout, through to automatic creation & running of an executable. Some examples...
Compiled executables accept the following command-line options:- -F fs This sets FS to a value. -v var=value This sets variable var to a value. The executable will complain if the variable does not exist, or if it is not scalar. -Wi Input will be line-buffered, and RS will be set to '\n'. -We Arguments following this will be passed to the program in the ARGV array. Tips for optimizing AWK scriptsImplementations of AWK vary in what (if any) performance enhancements they provide. In general, I try to make Awka as fast as possible executing every facet of the language, and where I can I introduce short-cuts to avoid unnecessary processing. Therefore the suggestions below are most relevant to Awka, and may or may not make other AWKs run faster (but they definitely won't slow them down).
How to use Functions Avoid, if you can, short functions of only two or three lines. Also, where you must use a function, don't have unnecessary local variables and don't use scalar variables as arguments, particularly if they're strings. Every scalar variable used as an argument must be copied to a new, temporary variable to be used inside the function, and each of these must be specially managed by the interpreter or libawka - all of this adds up to extra execution time. So when do you use functions? When the code in a non-function section is getting too long. If you find you have several hundred lines in one section of your script, chances are breaking it into functions will allow it to run faster, as C compilers struggle to optimise really large blocks of code effectively (obviously this only applies to Awka). Speed aside, avoiding large and unstructured blocks of code can make your script significantly easier to manage as well.
Minimise use of $0 ... $n variables This extra checking will slow your program down. If, for example, your program reads the value of $1 repeatedly consider setting a scalar variable to its value and using this variable instead. Here is an example, cut & paste straight from a terminal window:-
$ time gawk 'BEGIN { x1="hello"; x2="there"; for (i=0; i<200000; i++) x = x1" "x2 }'
Minimise Array usage Ok, that probably didn't make sense. Lets try an example. Instead of:- ($1 in arr) { for (i=0; i<arr[$1]; i++) z[$1,i] = arr[$1] + i } you could use:-
($1 in arr) { idx=$1; a=arr[idx]; for (i=0; i<a; i++) z[idx,i] = a + i }
Use Integers to Index Arrays In Awka you needn't even set the index to numeric form - merely reading it as a number is sufficient for the integer value to be used in arrays (ie. x+0; array[x]="hello"), however this optimisation doesn't exist in Mawk. This technique can provide dramatic improvements in execution time and RAM usage, so its worthwhile remembering.
Scalar Variable Type Switching On a similar note, if you have scalar variables that contain regular expressions, don't switch them back and forth between RE and another type - this applies to all AWK implementations.
What other free AWKs are there?Where can I find more information about AWK?Andrew Sumner - (andrewsumner@yahoo.com) visitors since I added this counter. |