Awka Stuff:

Home of Awka
Specifics
Downloads
Change Log
Bug Tracker
Awka-ELM
Awka-ELM Libraries
Future Plans
Comparisons
Contributors

AWKA FORUM


Other Stuff:

Xglobe Maps
About Me...


Generously hosted by...

SourceForge Logo



Awka - Specifics





This is a basic question & answer section. If you want to know something that is not described here, feel free to ask me (andrewsumner@yahoo.com).

Does Awka faithfully implement AWK?

Yes. AWK programs compiled using Awka should behave exactly the same as they would in Gawk, the most popular free AWK interpreter. I also try to eliminate conflicts with the Posix standard for AWK whenever I become aware of them.

What standard AWK features are missing?

The support in awka-generated executables for -v var=value and var=value on the commandline is not quite the same as other AWKs. At translation time Awka identifies all global variable names used by a script, and these are hard-coded into the C code it produces. Hence if the var specified in this commandline is not one of the variables used in the script, the statement will be ignored and if the -v option is missing, it will be treated as a filename.

Please note that var=value without a preceding -v has only been supported since version 0.7.3.

What Gawk features are missing?

These are features unique to Gawk, that are not part of the Posix AWK definition.

The IGNORECASE variable is not supported.

The pseudo /dev/xxx process-id functionality is not supported. Only /dev/stderr and /dev/stdout pseudo devices are present in Awka, otherwise the device must actually exist. As from release 3.1.0 Gawk is phasing out support for /dev/pid etc. also, introducing the PROCINFO array instead.

Awka does not yet support PROCINFO, though the array name has been reserved for use as a builtin variable.

Gawk introduces "conditional specifiers" in printf formats, which allow you to specify which argument is used. Awka does not as yet support this.

Gawk's --lint command-line option and LINT variable are not supported. Instead Awka provides its own warning messages using -w. Type awka --help or see the awka manpage for more information on what this does.

Gawk's BINMODE variable is not supported.

Awka has not been internationalised, and does not have the new TEXTDOMAIN variable that gawk 3.1.0 introduces for this purpose.

What additional features does Awka provide?

Quite a few, in the form of extra builtin functions.
  • totitle converts a string to Title Case.
  • ascii returns the ascii code of a character.
  • asort array sorting functionality as per Gawk 3.1.0.
  • char returns the character value of an ascii code.
  • left returns the leftmost n characters of a string.
  • right returns the rightmost n characters of a string.
  • ltrim trims whitespace (or other character) from left of a string.
  • rtrim trims characters from right of a string.
  • trim trims characters from both left and right of a string.
  • and returns the result of a bitwise AND of two integers.
  • or returns the result of a bitwise OR of two integers.
  • xor returns the result of a bitwise XOR of two integers.
  • compl returns the complement of an integer.
  • lshift shifts bits in an integer to the left.
  • rshift shifts bits in an integer to the right.
  • min returns the lowest number in a list of up to 255 arguments.
  • max returns the highest number in a list of up to 255 arguments.
  • mktime as per the mktime function in Gawk 3.1.0.
  • time returns number of sec since 1/1/1970, for a user-specified time.
  • gmtime returns formatted time string set to Greenwich Mean Time.
  • localtime returns formatted time string set to LocalTime.
  • alength returns the number of elements in an array.
  • argcount returns the number of arguments to a function.
  • argval allows you to address function arguments by number.
  • getawkvar allows you to address global variables using a string value.
It is quite possible that existing AWK programs will contain variables and/or functions using the names of the above functions. Where this is the case, Awka will prefer the use of the name as a variable or function over its use as the name of a builtin function. This means existing programs will not be broken by the above extensions!

Please note that eventually most of these will be moved into an extension library, rather than having them present as arbitrary enhancements where they may not be needed.

How do I use Awka?

The first part is using the translator program awka to convert your AWK programs to C. awka is a command-line program, as are Gawk, Mawk and others. The command-line options are:-

-c fn Instead of generating a 'main' function, awka will call this function fn. This is used to integrate translated code into a larger C or C++ application.

-x awka will translate, then automatically compile and execute your program. It produces the files awka_out.c and awka.out.

-t If -x is present, this will remove the .c and executable files after the program has been run.

-X This will create the C file awka_out.c and compile the executable awka.out, but not run it.

-f file Specifies AWK program files.

-- If -x is present, all arguments after this point will be passed to the compiled executable when it is run.

-v Prints version information.

The command-line options provide a range of ways in which Awka may be used, from output of C code to stdout, through to automatic creation & running of an executable. Some examples...
    awka -f myprog.awk >myprog.c
This will translate myprog.awk to C, storing the contents in myprog.c.
    awka -X -f myprog.awk -f other.awk
This will translate the two AWK programs, creating an executable.
    awka -x -f myprog.awk -- -F"," input.txt
Awka will create an executable, then run it with the arguments following the double-hypens. By comparison, you would do the same using Gawk by typing gawk -f myprog.awk -F"," input.txt.

Compiled executables accept the following command-line options:-

-F fs This sets FS to a value.

-v var=value This sets variable var to a value. The executable will complain if the variable does not exist, or if it is not scalar.

-Wi Input will be line-buffered, and RS will be set to '\n'.

-We Arguments following this will be passed to the program in the ARGV array.

Tips for optimizing AWK scripts

Implementations of AWK vary in what (if any) performance enhancements they provide. In general, I try to make Awka as fast as possible executing every facet of the language, and where I can I introduce short-cuts to avoid unnecessary processing.

Therefore the suggestions below are most relevant to Awka, and may or may not make other AWKs run faster (but they definitely won't slow them down).

How to use Functions
Even in compiled languages such as C, functions degrade ever so slightly from the execution speed of code. In most AWKs including Awka the extra processing required to call functions is considerable.

Avoid, if you can, short functions of only two or three lines. Also, where you must use a function, don't have unnecessary local variables and don't use scalar variables as arguments, particularly if they're strings. Every scalar variable used as an argument must be copied to a new, temporary variable to be used inside the function, and each of these must be specially managed by the interpreter or libawka - all of this adds up to extra execution time.

So when do you use functions? When the code in a non-function section is getting too long. If you find you have several hundred lines in one section of your script, chances are breaking it into functions will allow it to run faster, as C compilers struggle to optimise really large blocks of code effectively (obviously this only applies to Awka). Speed aside, avoiding large and unstructured blocks of code can make your script significantly easier to manage as well.

Minimise use of $0 ... $n variables
These variables require special consideration by the AWK interpreter. When you read the value of $0, it must check for changes to $1..$n or to NF, and if necessary rebuild $0 before providing the value. Similarly, reading the value of $1 means that the program must check to see if $0 has changed, and if necessary re-evaluate $1.

This extra checking will slow your program down. If, for example, your program reads the value of $1 repeatedly consider setting a scalar variable to its value and using this variable instead.

Here is an example, cut & paste straight from a terminal window:-

$ time gawk 'BEGIN { x1="hello"; x2="there"; for (i=0; i<200000; i++) x = x1" "x2 }'
gawk 1.08s user 0.01s system
$ time gawk 'BEGIN { $1="hello"; $2="there"; for (i=0; i<200000; i++) x = $1" "$2 }'
gawk 1.30s user 0.01s system

Minimise Array usage
Putting the same array reference again and again in your program is wasteful. All AWKs implement arrays using a technique known as hashing. While this is usually fast, it still involves extra processing that should be avoided if execution speed is important. Set a scalar variable to the value in the array, and use this variable instead of the superfluous array references.

Ok, that probably didn't make sense. Lets try an example. Instead of:-

($1 in arr) { for (i=0; i<arr[$1]; i++) z[$1,i] = arr[$1] + i }

you could use:-

($1 in arr) { idx=$1; a=arr[idx]; for (i=0; i<a; i++) z[idx,i] = a + i }

Use Integers to Index Arrays
Both Awka and Mawk will run much faster if the value used as the array index is an integer. If, for instance, your index is in string form but is actually an integer (eg. x="402957"; array[x]="hello"), set 'x' to numeric form (x += 0).

In Awka you needn't even set the index to numeric form - merely reading it as a number is sufficient for the integer value to be used in arrays (ie. x+0; array[x]="hello"), however this optimisation doesn't exist in Mawk.

This technique can provide dramatic improvements in execution time and RAM usage, so its worthwhile remembering.

Scalar Variable Type Switching
If you manage your scalars so that they are always the same type, either number or string, then Awka can produce more efficient C code that avoids calls to libawka and hence will run faster. If your script is math-intensive this can make a significant difference.

On a similar note, if you have scalar variables that contain regular expressions, don't switch them back and forth between RE and another type - this applies to all AWK implementations.

What other free AWKs are there?

Where can I find more information about AWK?





Andrew Sumner - (andrewsumner@yahoo.com)
Site Meter visitors since I added this counter.