Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for sharing this. I'm the author.

When I wrote my introduction to JQ someone mentioned JQ was tricky but super-useful like AWK. I nodded along with this, but actually, I had no idea how Awk worked.

So I learned how it worked and wrote this up. It is a bit long, but if you don't know Awk that well, or at all, I think it should get the basics across to you by going step by step through examining the book reviews for The Hunger Games trilogy.

Let me know what you think. And also let me know if you have any interesting Awk one-liners to share.



The funny thing is, by and large my only use case for awk is to print out whitespace delimited columns where the amount of whitespace is variable. Surprisingly hard to do with other Unix tools.

Neat discussions around that sort of thing at least here: https://news.ycombinator.com/item?id=23427479


The syntax isn't nearly as nice, but Perl can be handy if you're doing something more after splitting into columns. And it's usually already there / installed, like awk. For just columns:

  $ printf "a b  c d   e\n1 2  3 4 5" | perl -lanE 'say "$F[2] $F[4]"'
  c e
  3 5


It surprized me that AWK had dictionaries and no declaration of vars that make it feel like a modern scripting langauge even though it was written in the 70s.

It turns out though that this is because Perl and later Ruby were inspired by AWK and even support these line by line processing idioms with BEGIN and END sa well.

    ruby -n -a -e 'puts "#{$F[0] $F[1]}"'

    ruby -ne '
    BEGIN { $words = Hash.new(0) }

    $_.split(/[^a-zA-Z]+/).each { |word| 
    $words[word.downcase] += 1 }

    END {
        ...


I think it's pretty obvious that awk syntax is ultimately the main inspiration for JavaScript syntax, with optional semicolon as stmt terminator, regexp literals, for (x in y), the function keyword, a[x] associative array accessors, etc.


they spend a lot of time to make one line perl can handle most function of awk, sed et al.


A long while ago I wrote up a little processor to determine field lengths in a given file - I forgot the original reason. ( https://github.com/sullivant/csvinfo )

However, I feel I really should have taken the time to learn Awk better as it could probably be done there, and simply! (It was a good excuse to tinker with rust, but that's an aside.)


For some idea, a one liner to find the (last) longest username and length in /etc/passwd:

  $ awk -F: '{len=length($1);if(len>max){max=len;user=$1}}END{print user,max}' /etc/passwd


Thanks for that reply! It's good to work with an example.


I'll mark this on my GitHub when I get back on a computer, I take public datasets and make graphs and transforms and reports. The big survey companies have weird data records and having to write a parser is my least favorite part. I think other people who ingest my content don't appreciate the effort, but that's a near universal feeling I think, heh.


If i don't use awk, i throw tr -s ' ' into the pipeline, and then the delimiter is a single space, so you can just cut.


That will collapse multiple spaces, but won't handle a mix of spaces and tabs, which awk will handle.


choose from your link does look nice for simple column selection.

   echo -e "foo   bar   baz" | choose -1 -2
vs awks

   echo -e "foo   bar   baz" | awk '{ print $2, $3}'
I love the effort people are putting into reinventing the core unix tools.

I think I'll stick with Awk for now though.


The problem with new tools is

$ choose

bash: choose: command not found...


  ls -l | tr -s ' ' | cut -d ' ' -f 5


Exactly! Exactly! And now fix it to work with tabs :-)


And leading whitespace. Compare:

  $ printf " one two  three"  | tr -s ' ' | cut -d ' ' -f 1

  $ printf " one two  three"  | awk '{print $1}'
  one


  ps ax | sed 's/^\s\+//; s/\s\+/ /g;' | cut -d ' ' -f 4


  echo -e '1\t2\t3\t4\t5' | expand -t 1 | cut -d ' ' -f 3


I really appreciate you writing this guide. As a long time Linux user, I've always wanted to learn AWK, but it seemed too daunting. Three minutes into your guide and I immediately saw how I could use it in my day-to-day usage.


I blame GNU's man page. I was in the same situation for the longest time, but stumbled over a man page for a simpler implementation of awk (plan9's, in my case) and learned it in 10-15 minutes (not claiming I understood it more than partially in that time of course, but enough to write my own small programs).

Since then I've made a point of finding man-pages from other systems whenever the manual for a GNU tool is a bit daunting. It tends to lower the learning threshold quite a lot, honestly.

    $ man gawk | wc
       1568   13030   94207
    $ man -l /usr/share/man/man1/awk.1plan9.gz | wc 
        214    1579   10956
Not trying to detract from this great guide. Just a general tip :)


I tend to use cat-v since you can check really old versions which tend to be far simpler.

http://man.cat-v.org/unix_7th/1/awk


Thank you! It took me longer to write then I expected it would. I was originally just going to do some small examples of each idea.

But once I got the idea of aggregating the book review data from amazon I felt I had to see it through.


As someone who's never used awk before, I really enjoyed this write-up and I think it was very well written!


chiming in, I had a feeling that the article and the comments here would contain some jewels and both have exceeded expectations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: