Brian Kernighan recently told me that "Al and Peter and I" (that is, A, W, and K) are working on a second edition of the book "The AWK Programming Language". The first edition is excellent (it inspired me to write an AWK interpreter in Go). Anyway, I'm really looking forward to seeing what the second edition looks like. I think it'll help renew interest in AWK in general, and no doubt bring some of the examples up to date (though the book has aged very well!).
This is incredible news, thanks for sharing. I would never have guessed that, really. I wonder what's their motivation (why now?).
The AWK book was one of the fundamental books I used to teach myself some coding. The precision of the language is remarkable; I wonder how much will be different in the new edition.
I try not to plug it every time I see a submission about awk, but here I must recommend freedomben's presentation "awk: Hack the Planet!" [0]
It starts off with similar background and praises for a few minutes before spending the rest of the hour crash coursing you through awk, then leaving you with a very approachable, digestible, and realistic set of exercises (and optionally, a second video covering his solutions).
I keep a text file with a copy of the questions and my solutions on a public gh repo, so I can quickly refer to it from anywhere when needed.
I am much more powerful on the command line because of it.
I owe you for indulging in your recommendation - thanks! The lecture quality is unrivaled when held up to the YouTube tutorials of today.
For anyone else considering investing the time: I am extremely satisfied with the 2 hours it took to learn + practice the basics. As far as high-yield learning investments go, I’d already put awk up there with my time spent learning Vim and Git.
You can still do it with awk, but with different "ergonomics":
awk -F= '$2 ~ /[0-9]+/ { print $2 }'
With imaginative choice of FS and RS you can push it very far.
Whether other people having to deal with such code will appreciate your imagination is another matter, though.
Edit: I missed the detail where you want to specifically match "foo" as lhs, and anywhere on the line. So the correct condition would be even lengthier : ^ ) You have a valid point. Captures would provide for shorter patterns.
I recently saw someone on StackOverflow asking for something similar: access, from the condition-action body, to the text which matched the regex.
So I added a feature to the TXR Lisp awk macro. There is now an Awk variable called res which holds the result of the condition. If the condition is a regex, then that has the matching part. The fact that the action is executing tells us that the result of the condition is true; but res gives us the specific true value, like the it in anaphoric if macros.
I don't use `match` often, so I get the `match(str, regex)` order wrong. So yeah, it would be nice if gawk automatically provided the capture groups via some special variable.
Not really verbose for this particular example though:
I have to admit I never really learned awk, but whenever I want to do something that I think "awk would be god for this", I use perl. Are there things for which awk is a significantly better tool?
Congratulations (?) for making the top five google results for "ripgrep capture groups" [1].
I cite sources way more often than not, this time I got lazy after dithering over whether to go with the definitive ripgrep source page [2] or a decent looking third party(?) tutorial .. pressed for time I did neither.
The above will give all matches in a line though. You can remove `(?=\))` part if numbers are always enclosed in `()` or if you don't care about the `)`
Yep, functions! I used to write a fair amount of Awk code back in the late '80s and early '90s. I treated Awk as a "real" programming language and tried to make the code nice and readable. This of course involved a lot of use of functions.
I only have a couple of surviving examples of the code from back then, but here they are for the curious:
LJPII.AWK is probably the best example. It made a nicely formatted printout of source code on my HP LaserJet II printer. I wish I had one of the printouts it generated, but they are long gone.
Hmm... I wonder if my Brother printer supports the old LaserJet II control codes? Or maybe there is an emulator online?
The code was written for Thompson Awk (TAWK), so some bits would need to be adapted to modern Awks.
You can check if the Brother printer supports PCL which it likely does. Somewhere online will be an explanation of the differences between gnu awk and the version you used.
Awk is an amazingly powerful tool. I remember writing an awk script in an interview once (instead of using Python), and the person giving the interview was amazed at how fast it could be written and how fast ran.
Poplar is an also-ran (from that time frame). Normally I prefer to concentrate on ideas (of which there are a few in the link below) rather than people ... but in this case, for context, it's worth noting the authors:
In my opinion Perl is much more powerful than Awk, available just about everywhere Awk is available, and even explicitly takes inspiration from Awk for some features (like BEGIN blocks).
I agree. Yet I have used awk in a few places instead. Part of it is that it's easier to get a less powerful language accepted by colleagues, when both are in the "completely foreign to me" category.
Really not a fan of awk. It really looks nice but I have inherited a lot of it in the past and know how many foot guns there are. Trashing the unix philosophy of using text as a communication protocol, I've seen terrible terrible mistakes when it comes to repetitive re-parsing stuff at each pipe step.
The finest was a 3rd party who accidentally added a space in a data feed. This was dutifully sucked out of their SFTP server via a bash script, pre-processed using awk into the standard internal format and then picked up by a cron job which ran a python script to inject it into postgres. The outcome was of course that the columns were offset by 1. This caused a huge asset valuation dip and some market alarm.
A proper parser would have rejected the whole dataset as the data row could not be parsed.
I get that AWK encourages this type of code, but I would still argue it's not really "an AWK problem". It's more of a problem of how carefully you model your inputs.
Model inputs in great detail and you can throw out a lot of invalid data, but it takes longer to get the code running. Model the input only very crudely and you're up and running quicker, but more open to broken expectations.
Of course, it's way too easy to go the latter route for most programmers...
I wrote an awk script to rip through supposedly ascii text files to deal with CP1252 extended ascii characters that would creep in. Those characters played havoc with the output of our bindery’s commercial inkjet print heads. That stuff was fast on ancient equipment
They're currently trying to transport it to C#, but its slower going than developing in Delphi, ironically, the thing that makes us move to C# is developer availability.
Thanks for the plug (I'm the author of GoAWK). Yeah, I'm hoping the CSV feature will really be useful for data science and the like. There are so many CSVs pushed around these days. See more here: https://benhoyt.com/writings/goawk-csv/
I find that opinionated tools like awk are esoteric and very niche, but for things they do, they do very well. Writing awk scripts for simple text transformations brings me immense pleasure.
yes, maybe esoteric today, but not when only tool available (~late 80's) across multiple platforms mac, unix flavors, vax, spery rand, burrows, wang, sun, alpha and x86.
yup. I wonder "what if" awk had added records (dotted r.attrib notation) and namespaces. I've created sizeable awk scripts back in the day, and I missed the former dearly and the latter somewhat.
I don't mind AWK for super simple things, but there is a reason why the sysadmins from the Bad Old Days(tm) who now have grey beards all converged to Perl.
Practically everything you can do in AWK can be done just as easily and quickly in Perl. And Perl absolutely wins when you need to do that one extra thing that AWK really just can't do.
And I say this as a person who switched over from Perl to Python eons ago.
Seriously, Perl is an okay language for quick and dirty things of a tiny/small size. Yeah, it's not the best language for a large development project, but if you do need to parse /etc/passwd or something, not only it's perfectly good as-is, but you'll certainly find something on CPAN that already does it well.
I can't imagine why would one want to do that kind of thing in C. It's just unnecessarily painful, and you'll spend 90% of the time on doing things that don't solve the actual task you need to be solved.
Yeah, in modern times it's gone way downhill, but that's mostly if you intend to do something big with it. I wouldn't use it to start a new, fully featured CMS. But for sysadmin type stuff as an alternative to sh/awk it's still just as usable as ever.
I know loads of people loved Perl, and I'm not suggesting that my perspective is mainstream or even defensible. I'm just saying that I was a young sysadmin in the "bad old days", but that I didn't like Perl. Of course I did use an awful lot of Perl scripts.
That said, I have to revise my original comment. I forgot that I was a big fan of Tcl/Tk/Expect back in the day. So it's not like my taste is better than anyone else's :)
I have a perennial fascination with awk. It's one of the first things you find in /bin alphabetically, on almost any Unix. It's small but powerful, and somewhat mysterious.
Sadly I've learned it & cheatsheeted it for future reference but never find myself reaching for it. Part of it is prefering python over shell scripting maybe - awk fits better in a shell scripting world.
I'm using a 20 KiB awk script that I wrote from scratch to calculate taxes owed from investments. German tax code is a bit tricky when your broker is abroad because you have to calculate everything yourself. Hilariously some German brokers don't even apply the FIFO rule correctly. Nevermind regulations about fees or double taxation treaties. FX rates for conversion into EUR are extracted either using rga from Swissquote's own PDFs, or downloaded off the internet. The transaction history itself is one big CSV export from their web site that is also parsed and analyzed using awk. For tax time I call the script with the year I want and transfer everything into official forms. Without spending 3 days in Excel. Or 10. I can't praise awk enough.
I don’t know any programming languages…but I feel like I “get” awk…
Also, this blog post helps with the above. For some reason I like how the author cites technical reviews of the article and email correspondence as references.
Missed opportunity to use one of my favourite words: grok [0].
And congratulations. I’ve been trying for the past year to learn awk, and while I can pretty reliably split text files and extract columns, I’m pretty far from being able to grok awk.
http://web.archive.org/web/20230115205059/https://www.fossli...