My very first job getting paid to write software was writing in scripts in Awk to parse and analyze some software log files, for a faculty software researcher, in, maybe, 1997? i didn't know Awk before, it's just what I inherited. Spent a few hours with the O'Reilly book, and I was like, okay, sure, let's go.
As the stuff we were doing in that project got more complex, at some point someone suggested to teenage me "You might want to look at Perl for this now," and then I moved to that. (with the Camel O'Reilly book, of course!)
Haven't touched either one in years now.
Learning new things can be much more overwhelming for me now, I don't know how much is me vs environment. But I am nostalgic for those days where I'd sit down with a print book, and within hours have a grasp of the fundamentals, or within days feel like I had basic fundamental conceptual understanding of the whole dang thing (not of every possible feature, but of the conceptual framework, the big picture).
I read a different book, written by the creators of AWK themselves. But the experience was much the same. You can read it in one sitting.
Or, rather, you can read the two first chapter easily in one sitting. Chapter one gives a brief overview and examples. Chapter two describes the whole language, every function and every variable! The rest of the book is just more examples. I really love this style!
If I write more than about 50 bytes of awk I inevitably end up using perl instead because it's much more powerful. One example was I wanted to convert a column in format a|b|c to ['a', 'b', 'c']. Doing that in awk is painful. In Perl, there's join and map, takes a few seconds.
One of the reason is that the frameworks that the book'd use are changing completely every 6-12 months. If you write a book that uses k8s, after 12-24 months, the examples you use might not even run.
Whereas if I pick up an Awk book that OP referred, it's likely I could still use it to learn, you can't really do the same with most of modern tech stack.
I'm already a casual awk enthusiast but I'm really hoping to find an opportunity to use it for a "real" software project soon. I've been reading the gawk user manual, and suffice to say, the power and features of the language is dramatically underutilized for most of the things people normally do with it (my most common use case is probably a hybrid of grep and cut)
I wrote an IRC bot in it, one of those "paste a line of code and the bot will evaluate it and print the result" bots that you find in programming language channels. It's not a particularly big or "real" project, but it definitely fulfills the need of having a bot in that particular IRC channel.
awk is great for it because IRC (or at least the subset that the bot cares about) is relatively easy to parse, and shelling out the shell script that does the actual code evaluation and prints the result back is also fairly straightforward. Someone else used to have such a bot before but they had written it in Rust with a bajillion dependencies; if I had done that I would've had to update dependencies and redeploy it every other week. In contrast I deployed my awk version once and then basically haven't touched it in years.
This is a cool way to make awk talk over a socket that isn't specific to Gawk. For sockets without TLS you can replace openssl(1) with nc(1). I'll keep it in mind.
I recently wrote a program of slightly over 200 lines in portable AWK: https://gitlab.com/dbohdan/humsize. I wrote it for a specific operating system (NetBSD), but I have ended up using it everywhere; the portability helped with it. I can recommend AWK for small utilities that transform text in a line- and column-oriented manner and don't need libraries.
The main difficulties were making the command-line interface and testing. You can't have flags that begin with a dash in portable AWK without a shell wrapper, and I didn't want one. I settled on manually parsing key=value options, which I don't think are bad, just nonstandard. They look like this:
humsize format=%6.1f%1s 'zero= empty'
There is no standard way to test AWK code. For testing I wrote a shell script that checks the program's outputs with grep: https://gitlab.com/dbohdan/humsize/-/blob/122aaed8d65dc8c285.... Don't do this; your tests should give the user (you) better feedback. You may think your program doesn't need anything but a couple of trivial tests that won't ever change; it is a pain when you inevitably are proven wrong. I should have instead had a directory with reference outputs and diffed against them to see what went wrong (my own example: https://github.com/dbohdan/initool/blob/72f65d3fde245ff8660c...).
To ensure I didn't introduce portability issues, I set up testing against different awks in GitLab CI.
I’m usually not a big side project guy, but I successfully used AWK to solve a IRL problem last year. It really helped solidify my understanding of the language.
The problem was that the Garmin GPS data for a bike ride I had just completed had split into multiple rides. I used AWK to stitch together the data into one file. I also did some basic linear interpolation to fill in missing data points.
The GPS data is formatted as XML and I was able to parse it fairly robustly using AWK.
How did you parse XML with AWK? I would never think of using AWK for XML data. I'd even stear clear of CSV data unless I could guarantee no in field commas or newlines.
Commas are easy if it's quoted. I just first run an awk script that uses " as the field separator and substitutes or deletes commas in odd numbered fields (as long as that's acceptable for your use case). Then with `-F,` I always check that NF is the same for all lines in the csv before proceeding.
Depending on how the xml is structured, it can be possible to just pattern match on the tags if you have something simple to do.
Big, big fan of AWK. It sometimes feels like ancient, alien UNIX technology to me. But lately I've been gravitating more and more towards perl. You can write the same one liners (with perl -e and friends), it has superb support for regexes and it's just a more capable language (as expected, not bashing AWK).
You can use Ruby for this task too. I used to use Perl for throwaway one-liners, but on advice I switched to Ruby because of the bigger community and I'm pretty happy with it.
(Python isn't as nice for one-liner text processing, both because of the lack of Awk heritage--so no built-in regex syntax--and because of the indentation-based syntax requiring newlines for most things.)
grep|ripgrep > awk|sed > most scripting languages > shell
gawk has regexps.
We use Ruby a bit at work. Most coworkers hate it, internal customers scoff at it, and no one's interested in mastering it, using it properly, or considering even fundamental software engineering principles. Tech debt piles up and no one wants to touch it because there's no performance review KPI credit for it.
I like your list. As a Ruby writer for many years I’ll add Crystal > Ruby. Being able to deploy a single binary is such a boon that even the other myriad improvements aren’t worth mentioning, especially if we’re in the “what you might do with Awk” territory. Go users probably feel that way too but I know much less about that.
The author is really persuading me to learn awk because he use to talk about the very reasons I avoid to do it as a faulty reasons, and I consider his reasoning as decent.
> The absense of GC allows to keep the language implementation very simple, thus fast and portable. Also, with predictable memory consumption. To me, this qualifies AWK as perfect embeddable language, although, for some reason this niche is firmly occupied by (GC-equipped) Lua
The absence of a GC is nice for an embedded language, but I don't think that should be the only criteria. Unless you needed an embedded language that processes text one line at a time awk is probably not a good fit.
I used AWK for many years, but one day I realized that I had pushed AWK beyond whats its meant for, same as the author here. classic red flag from the article:
Not disagreeing with the overall point but that particular example is from an AWK JSON parser implementation so the whole point is to do it in AWK. If you look at the entire file it's not too bad considering.
Funnily the actual Go JSON decoder code ends up doing something similar during scanning:
As mentioned, the example you quoted is from a pure-AWK JSON parser. I don't dispute that AWK has issues, but AWK is one of those languages that magically coerces strings to numbers, so you can just write `"1" + 2 + "3.5"` and it'll work.
I am not an expert but I’d say it’s because Awk arrays are associative; they are more like maps than slices, to use Go terminology. And IIRC (it’s been a while) the array values are not strongly typed. So I think you could even say:
a[1] = “hello”
a[“world”] = 2
That means that - unlike C arrays - Awk arrays are not a simple, addressable byte range, but a complex data structure with lots of pointers.
I suppose you could come up with a way to serialise the array and pop it on the stack but that would be a lot of work, and for the kind of things I use Awk for, the arrays would often be huge.
I'm not really convinced my that argument. No matter how big your associative array is, it is still represented by a pointer to the initial/root element. It should be quite possible to move the pointer into the caller.
Maybe the language is simpler without it, and that can be a good reason to avoid it. But I don't buy that it has anything to do with GC.
OMG never realized that $ is an infix operator - Plenty of times where I needed something like $(NF-1) and instead used verbose stuff like
NF==5 { ... }
NF==6 { ... }
Right, but somehow they've been the top post for 1/2 hour and I got modded way down for pointing out it was an obvious troll. I hesitate to comment because I assume that's what the script kiddie is looking for out of this.
As the stuff we were doing in that project got more complex, at some point someone suggested to teenage me "You might want to look at Perl for this now," and then I moved to that. (with the Camel O'Reilly book, of course!)
Haven't touched either one in years now.
Learning new things can be much more overwhelming for me now, I don't know how much is me vs environment. But I am nostalgic for those days where I'd sit down with a print book, and within hours have a grasp of the fundamentals, or within days feel like I had basic fundamental conceptual understanding of the whole dang thing (not of every possible feature, but of the conceptual framework, the big picture).