When I wrote my introduction to JQ someone mentioned JQ was tricky but super-useful like AWK. I nodded along with this, but actually, I had no idea how Awk worked.
So I learned how it worked and wrote this up. It is a bit long, but if you don't know Awk that well, or at all, I think it should get the basics across to you by going step by step through examining the book reviews for The Hunger Games trilogy.
Let me know what you think. And also let me know if you have any interesting Awk one-liners to share.
The funny thing is, by and large my only use case for awk is to print out whitespace delimited columns where the amount of whitespace is variable. Surprisingly hard to do with other Unix tools.
The syntax isn't nearly as nice, but Perl can be handy if you're doing something more after splitting into columns. And it's usually already there / installed, like awk. For just columns:
$ printf "a b c d e\n1 2 3 4 5" | perl -lanE 'say "$F[2] $F[4]"'
c e
3 5
It surprized me that AWK had dictionaries and no declaration of vars that make it feel like a modern scripting langauge even though it was written in the 70s.
It turns out though that this is because Perl and later Ruby were inspired by AWK and even support these line by line processing idioms with BEGIN and END sa well.
ruby -n -a -e 'puts "#{$F[0] $F[1]}"'
ruby -ne '
BEGIN { $words = Hash.new(0) }
$_.split(/[^a-zA-Z]+/).each { |word|
$words[word.downcase] += 1 }
END {
...
I think it's pretty obvious that awk syntax is ultimately the main inspiration for JavaScript syntax, with optional semicolon as stmt terminator, regexp literals, for (x in y), the function keyword, a[x] associative array accessors, etc.
A long while ago I wrote up a little processor to determine field lengths in a given file - I forgot the original reason. ( https://github.com/sullivant/csvinfo )
However, I feel I really should have taken the time to learn Awk better as it could probably be done there, and simply! (It was a good excuse to tinker with rust, but that's an aside.)
I'll mark this on my GitHub when I get back on a computer, I take public datasets and make graphs and transforms and reports. The big survey companies have weird data records and having to write a parser is my least favorite part. I think other people who ingest my content don't appreciate the effort, but that's a near universal feeling I think, heh.
I really appreciate you writing this guide. As a long time Linux user, I've always wanted to learn AWK, but it seemed too daunting. Three minutes into your guide and I immediately saw how I could use it in my day-to-day usage.
I blame GNU's man page. I was in the same situation for the longest time, but stumbled over a man page for a simpler implementation of awk (plan9's, in my case) and learned it in 10-15 minutes (not claiming I understood it more than partially in that time of course, but enough to write my own small programs).
Since then I've made a point of finding man-pages from other systems whenever the manual for a GNU tool is a bit daunting. It tends to lower the learning threshold quite a lot, honestly.
$ man gawk | wc
1568 13030 94207
$ man -l /usr/share/man/man1/awk.1plan9.gz | wc
214 1579 10956
Not trying to detract from this great guide. Just a general tip :)
When I wrote my introduction to JQ someone mentioned JQ was tricky but super-useful like AWK. I nodded along with this, but actually, I had no idea how Awk worked.
So I learned how it worked and wrote this up. It is a bit long, but if you don't know Awk that well, or at all, I think it should get the basics across to you by going step by step through examining the book reviews for The Hunger Games trilogy.
Let me know what you think. And also let me know if you have any interesting Awk one-liners to share.