Thanks for sharing this. I'm the author. When I wrote my introduction to JQ some...

foobarian · on Sept 30, 2021

The funny thing is, by and large my only use case for awk is to print out whitespace delimited columns where the amount of whitespace is variable. Surprisingly hard to do with other Unix tools.

Neat discussions around that sort of thing at least here: https://news.ycombinator.com/item?id=23427479

tyingq · on Sept 30, 2021

The syntax isn't nearly as nice, but Perl can be handy if you're doing something more after splitting into columns. And it's usually already there / installed, like awk. For just columns:

  $ printf "a b  c d   e\n1 2  3 4 5" | perl -lanE 'say "$F[2] $F[4]"'
  c e
  3 5

adamgordonbell · on Sept 30, 2021

It surprized me that AWK had dictionaries and no declaration of vars that make it feel like a modern scripting langauge even though it was written in the 70s.

It turns out though that this is because Perl and later Ruby were inspired by AWK and even support these line by line processing idioms with BEGIN and END sa well.

    ruby -n -a -e 'puts "#{$F[0] $F[1]}"'

    ruby -ne '
    BEGIN { $words = Hash.new(0) }

    $_.split(/[^a-zA-Z]+/).each { |word| 
    $words[word.downcase] += 1 }

    END {
        ...

tannhaeuser · on Oct 1, 2021

I think it's pretty obvious that awk syntax is ultimately the main inspiration for JavaScript syntax, with optional semicolon as stmt terminator, regexp literals, for (x in y), the function keyword, a[x] associative array accessors, etc.

popcube · on Oct 2, 2021

they spend a lot of time to make one line perl can handle most function of awk, sed et al.

flandish · on Sept 30, 2021

A long while ago I wrote up a little processor to determine field lengths in a given file - I forgot the original reason. ( https://github.com/sullivant/csvinfo )

However, I feel I really should have taken the time to learn Awk better as it could probably be done there, and simply! (It was a good excuse to tinker with rust, but that's an aside.)

tyingq · on Sept 30, 2021

For some idea, a one liner to find the (last) longest username and length in /etc/passwd:

  $ awk -F: '{len=length($1);if(len>max){max=len;user=$1}}END{print user,max}' /etc/passwd

flandish · on Sept 30, 2021

Thanks for that reply! It's good to work with an example.

genewitch · on Sept 30, 2021

I'll mark this on my GitHub when I get back on a computer, I take public datasets and make graphs and transforms and reports. The big survey companies have weird data records and having to write a parser is my least favorite part. I think other people who ingest my content don't appreciate the effort, but that's a near universal feeling I think, heh.

twic · on Sept 30, 2021

If i don't use awk, i throw tr -s ' ' into the pipeline, and then the delimiter is a single space, so you can just cut.

axiolite · on Oct 1, 2021

That will collapse multiple spaces, but won't handle a mix of spaces and tabs, which awk will handle.

adamgordonbell · on Sept 30, 2021

choose from your link does look nice for simple column selection.

   echo -e "foo   bar   baz" | choose -1 -2

vs awks

   echo -e "foo   bar   baz" | awk '{ print $2, $3}'

I love the effort people are putting into reinventing the core unix tools.

I think I'll stick with Awk for now though.

foobarian · on Sept 30, 2021

The problem with new tools is

$ choose

bash: choose: command not found...

goohle · on Sept 30, 2021

  ls -l | tr -s ' ' | cut -d ' ' -f 5

foobarian · on Sept 30, 2021

Exactly! Exactly! And now fix it to work with tabs :-)

tyingq · on Sept 30, 2021

And leading whitespace. Compare:

  $ printf " one two  three"  | tr -s ' ' | cut -d ' ' -f 1

  $ printf " one two  three"  | awk '{print $1}'
  one

goohle · on Sept 30, 2021

  ps ax | sed 's/^\s\+//; s/\s\+/ /g;' | cut -d ' ' -f 4

goohle · on Sept 30, 2021

  echo -e '1\t2\t3\t4\t5' | expand -t 1 | cut -d ' ' -f 3

choffman · on Sept 30, 2021

I really appreciate you writing this guide. As a long time Linux user, I've always wanted to learn AWK, but it seemed too daunting. Three minutes into your guide and I immediately saw how I could use it in my day-to-day usage.

scbrg · on Oct 1, 2021

I blame GNU's man page. I was in the same situation for the longest time, but stumbled over a man page for a simpler implementation of awk (plan9's, in my case) and learned it in 10-15 minutes (not claiming I understood it more than partially in that time of course, but enough to write my own small programs).

Since then I've made a point of finding man-pages from other systems whenever the manual for a GNU tool is a bit daunting. It tends to lower the learning threshold quite a lot, honestly.

    $ man gawk | wc
       1568   13030   94207
    $ man -l /usr/share/man/man1/awk.1plan9.gz | wc 
        214    1579   10956

Not trying to detract from this great guide. Just a general tip :)

guerrilla · on Oct 1, 2021

I tend to use cat-v since you can check really old versions which tend to be far simpler.

http://man.cat-v.org/unix_7th/1/awk

adamgordonbell · on Sept 30, 2021

Thank you! It took me longer to write then I expected it would. I was originally just going to do some small examples of each idea.

But once I got the idea of aggregating the book review data from amazon I felt I had to see it through.

kevinwang · on Sept 30, 2021

As someone who's never used awk before, I really enjoyed this write-up and I think it was very well written!

mousepilot · on Sept 30, 2021

chiming in, I had a feeling that the article and the comments here would contain some jewels and both have exceeded expectations.