The Awk Programming Language, Second Edition

benhoyt · on June 29, 2023

I was privileged to be one of the technical reviewers for this book. There's a fair bit of the original content (which is still great), but Kernighan's done a great job with some good restructuring and some significant updates, too. The early chapters are very hands-on, with something of a focus on "exploratory data processing", particularly with CSV files. Big data with AWK, you could say.

Gawk and awk will soon have a new "--csv" option that enables proper CSV input mode (parsing files with quoted and multiline fields per the CSV RFC). I'm really glad Arnold Robbins added a robust "--csv" implementation to Gawk, too, because that's really the most-heavily used version of AWK nowadays. I've already got CSV support in my own GoAWK implementation, and I'll be adding "--csv" to make it compatible.

I'm really glad this new updated version is coming out!

calvinmorrison · on June 29, 2023

Its a crying shame we never settled on a control character separated text format. There's a ascii control characters for record and field (unit) separators. A bit of user space support for that would have been great.

bachmeier · on June 29, 2023

As I recall, you can tell Awk to use the control characters as record and field separators. Not helpful if you're getting your data from others, but if you're working by yourself, you have the option. I've come to use control characters as a default because it makes life so much easier.

ufo · on June 29, 2023

What do you recommend for viewing and editing such files?

ac29 · on June 29, 2023

Visidata works with arbitrary separators. I just tried with a CSV separated with ␟ (ASCII unit separator) and it worked just fine.

lolive · on June 29, 2023

Excel too?

leonim · on June 30, 2023

lolive, VisiData has some Excel support. However, don't expect VisiData to be a full blown editor for Excel files. It can provide a view of the data in an Excel spreadsheet.

dotancohen · on June 30, 2023

I think GP means to ask if Excel can read files with ASCII 29,30,31 separators.

smitty1e · on June 30, 2023

If you have a python installation available, openpyxl[1] is great both for converting to .csv and for packaging .csv outputs as .xlsx (which is really zipped .xml, anyway).

[1] https://openpyxl.readthedocs.io/en/stable/index.html

galleywest200 · on June 29, 2023

It is a shame. I have been using tab-separated sheets recently as it allows me to simply not care about almost any possible character in my strings...apart from tabs of course. But those are far less common than commas, and putting strings in quotes 100% of the time looks messy to me.

calvinmorrison · on June 29, 2023

Way less common would be using ascii 30 and ascii 31. ascii 29 and you can cram multiple datasets into one file

hermitcrab · on June 29, 2023

Some discussion of that here: https://news.ycombinator.com/item?id=31220841

To be really useful as a format it would just need for text editors to: -display something distinct for the field separator (some editors do this) -treat the record separator character like a carriage return (not aware of any editors that do this)

coldtea · on June 29, 2023

>To be really useful as a format it would just need for text editors to: -display something distinct for the field separator

Which would be trivial too.

hermitcrab · on June 29, 2023

The programming might be straightforward. Trying to persuade the product owners to do it is a different matter.

dotancohen · on June 30, 2023

Pull request?

coldtea · on June 30, 2023

Yeah, that would work /s

hermitcrab · on June 30, 2023

After you. ;0)

throw0101c · on June 29, 2023

> To be really useful as a format it would just need for text editors to

This made me think of WordPerfect's "reveal codes" functionality. :)

(Word's "Reveal Formatting" is supposedly similar.)

calvinmorrison · on June 29, 2023

Right. The issue is the user space support at the end of the day.

PeterisP · on June 29, 2023

Tab-delimited "csv" formats are quite common (e.g. the CONLL format family for many natural language processing tasks) and also supported by common tools such as MS Excel for decades already.

JdeBP · on June 29, 2023

Miller handles it.

* https://miller.readthedocs.io/en/6.8.0/file-formats/#csvtsva...

I have programs that handle it.

* https://jdebp.uk/Softwares/nosh/guide/commands/console-flat-...

lolive · on June 29, 2023

Most important comment I have ever read on HN ever !

Simon_O_Rourke · on June 29, 2023

> Gawk and awk will soon have a new "--csv" option that enables proper CSV input mode

Awesome!!!! Super excited to see this!

ryenus · on June 30, 2023

Awk is really great, for those knowing nvm [1], I used awk to make `nvm ls-remote` run more than 10 times faster [2] by replacing the related shell script with around 60 lines of awk script [3], and I was quite happy with the improvement.

It's not really a one-liner, neither something big, but one can take that as an example regarding that awk is really not just for one-liners.

Meanwhile having `--csv` support is really nice. I'd also like to see things like a builtin `length` function to be standard.

[1]: https://github.com/nvm-sh/nvm/ [2]: https://github.com/nvm-sh/nvm/pull/2827/ [3]: https://github.com/nvm-sh/nvm/blob/9a769630d7/nvm.sh#L1703-L...

benhoyt · on June 30, 2023

But length() is standard POSIX, no? Even length(array) has been approved by POSIX [1] but not yet included in the spec (they're very slow to update the spec for some reason). Both forms have been supported in onetrueawk, Gawk, mawk, and Busybox awk for a long time.

[1] https://www.austingroupbugs.net/view.php?id=1566

ryenus · on June 30, 2023

Ah good to know, thank you! Meanwhile I found it, it was the awk from Debian 10 that lacked `length(array)`, see the related nvm pr [1].

[1]: https://github.com/nvm-sh/nvm/pull/2917/

anyfactor · on June 29, 2023

Our data product is delivered in CSV format. Even though I create user documentation mainly using csvkit, grep and sed, I would love to convert all those solutions to AWK. Sometimes AWK is more readable than sed and csvkit requires installation.

It will be nice to have a awk cookbook for CSV. In terms of CSV maniupulation and querying there is only a limited number of operations and I think there is potential to standardize those operation using AWK.

tomcam · on June 29, 2023

Ben is not just any old technical reviewer. He wrote a version of AWK in go and has done a ton of other work in the AWK ecosystem.

ryenus · on June 30, 2023

Thanks! ICYMI: GoAWK [1] - A POSIX-compliant AWK interpreter written in Go, with CSV support.

[1]: https://github.com/benhoyt/goawk

tomcam · on June 30, 2023

And it can be embedded in your Go programs as well.

nmz · on June 29, 2023

It's nice that everyone is supporting this, I've written a portable awk module that takes control of the parsing and it is SLOW (and a little buggy). I'm a little bummed that nobody will use it but this is truly a step in the right direction.

I guess for the people that are still using nawk, you can set up an AWK envvar so you can { awk -f $AWKU/ucsv.awk -f <(echo '{print NR, $1}') }

https://github.com/Nomarian/Awk-Batteries/blob/master/Units/...

lost_tourist · on June 29, 2023

Would you say the first few chapters are enough to get the 75-80% usefulness for mere mortals like me who will never try to master the full language? Or is the material fairly sprinkled throughout the whole tome?

benhoyt · on June 29, 2023

Yes, definitely. The first three chapters would be more than enough for that: 1) An Awk Tutorial, 2) Awk in Action, and 3) Exploratory Data Analysis. For most people who just want to use AWK for one-liners on the command line, you can stop there. The rest of the chapters are about writing larger (still small! but not one-liner) programs in AWK to create reports, little languages, and experiment with algorithms.

lost_tourist · on July 1, 2023

Thanks!

b3lm0nt · on June 30, 2023

Fantastic news. I’ve tried lots of new CLI tools but they always seem to fall between too little functionality (eg. xsv) and too much (VisiData). AWK is just right.

cauthon · on June 29, 2023

This is amazing, I may never use pandas again

siraben · on June 29, 2023

Awk is awesome! Glad that they are looking to modernize the book. It wasn't really necessary, all the code examples in the original edition of the book still run just fine, although some are somewhat dated, like printing ASCII bar graphs. They also had examples of writing VMs, parsers and interpreters in the book, which run on modern implementations.[0]

The language has some quirks. To declare temporary variables, it's common practice to add extra arguments to functions that won't be used. And traversal of associative arrays is implementation-dependent. I'm not sure what the situation is regarding locale and UTF-8 support.

EDIT: Looks like Brian Kernighan added Unicode support last year.[1]

[0] https://github.com/siraben/awk-vm/blob/master/vm.awk

[1] https://github.com/onetrueawk/awk/commit/9ebe940cf3c652b0e37...

kqr · on June 29, 2023

What would you suggest as an alternative to printing ASCII bar graphs? I do that all the time. Takes 20 seconds and often makes distributions, modalities, and patterns over time obvious right away.

zimpenfish · on June 29, 2023

`sparklines`[1] is good for an overall low-res view. `termgraph`[2] is sometimes better for a higher-res, more capable view (but can be finicky about the data.)

[1] https://github.com/deeplook/sparklines

[2] https://github.com/mkaz/termgraph

kqr · on June 29, 2023

But both require depending on a third party library -- hardly something on a whim if ASCII bar charts do the job?

zimpenfish · on June 29, 2023

Sure but, e.g., sparklines can show me the shape of my 60 numbers[1] more effectively on a single line of 60 characters[2] than an ASCII bar chart which would be 60 lines (without binning).

[1] 27623 14272 22218 21267 19037 989 27116 32405 23261 27104 7793 9432 7776 28832 13521 10783 29261 32193 30367 20358 22611 2023 19607 9844 3516 6510 16533 8378 22986 17043 14628 13392 22799 23847 29212 23690 17779 17059 28211 26180 32061 22740 7911 12018 4508 9801 9578 15350 9554 15517 11112 405 22054 2743 26609 7843 713 10975 2830 1126

[2] http://rjp-hosted-files.s3.amazonaws.com/sparkline-demo.png

llimllib · on June 29, 2023

gnuplot is an alternative that is available on almost as many systems as awk, and can do the job as well

edit: this prompted me to write up a little note showing how: https://notes.billmill.org/visualization/graphs/gnuplot/A_ba...

dima55 · on June 29, 2023

If you do this sort of thing more than once ever, look at the feedgnuplot tool (http://github.com/dkogan/feedgnuplot). It'll make your life easier

llimllib · on June 29, 2023

Neat! Once you're installing something to do terminal plots though, https://github.com/red-data-tools/YouPlot looks the nicest I've seen.

(The nice feature of feedgnuplot of course is that you can _also_ render the plots to images, which youplot can't)

zimpenfish · on July 3, 2023

> which youplot can't

But you could feed them through `textimg`[1] to generate PNGs.

[1] https://github.com/jiro4989/textimg

llimllib · on July 3, 2023

lol, I love it

bluetomcat · on June 29, 2023

Is there a particular benefit in writing a VM in AWK, placed in a big BEGIN block? Very similar code can be written in Perl or Python. Isn't the strength of AWK in its line-matching capability, being able to pattern-match a line against a block of code?

siraben · on June 29, 2023

> Is there a particular benefit in writing a VM in AWK

Not really. Later on the book just ran out of line-matching examples to go through and started doing regular programming instead :P. When I actually write AWK code I rely on line-matching and using a variable to handle state.

pdw · on June 29, 2023

At the time, awk was the only scripting language (other than shell) generally available on Unix systems. Perl, Tcl, Python didn't exist yet. So awk was often used for general-purpose programming.

chasil · on June 29, 2023

AWK runs everywhere. Perl and Python do not.

Busybox has their own independent AWK implementation.

https://busybox.net/ https://frippery.org/busybox/

Also see the first edition of the AWK manual online here:

https://archive.org/details/pdfy-MgN0H1joIoDVoIC7

unmole · on June 30, 2023

I'm sorry, where does Awk run but Perl and Python don't?

If we're counting minimalist implementations, there's micropython that even runs on microcontrollers that cost a less than 2 dollars.

dredmorbius · on June 30, 2023

The POSIX standard mandates awk.

It does not mandate Perl or Python.

<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/a...>

<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/c...>

There are many systems which lack Perl or Python, but include awk.

You might be carrying an Android device at the moment --- if you drop to its default userland, that provides a bunch of utilities, including awk, via Busybox. But not, so far as I'm aware, either Perl or Python.

(You can of course install Termux which will then give you both Perl and Python, along with Node.js, ruby, and a whole slew of other scripting and compiled languages. But so long as we're considering stock installs, it's sed and awk.)

paleface · on June 30, 2023

Small, embedded, busybox systems - per the parent comment.

unmole · on June 30, 2023

Like I said, MicroPython can run even on microcontrollers.

paleface · on June 30, 2023

I think parent may be more implying “living off the land” situations. Awk is ubiquitous, and part of the POSIX standard. Micro Python, not so much.

I’m a big fan of Forth, but it’s not available everywhere. One has to adapt.

ufo · on June 29, 2023

I love telling about that example to my programming language friends.

> Hey you should read the AWK book, it even says how to write a VM!

> Why would I ever want to use AWK for that?

> Well, the input is a text file with one space-delimited instruction per line.

> Hmm... You have a point.

anthk · on June 29, 2023

On wm's, why not a Z-machine? It's ideal for this.

donatj · on June 29, 2023

I love awk. It’s everywhere and every time I am writing a shell script and work myself into a corner, awk has been the way out.

I know exactly enough to be dangerous and have meant to deep dive for almost a decade.

coliveira · on June 29, 2023

awk can be mastered by just reading the man page. The book doesn't take long to read either. Once you understand the simple principles, you can write an infinite number of scripts for all kinds of tasks.

IggleSniggle · on June 29, 2023

See, when I'm writing a shell script interactively and work myself into a corner, I reach for awk, struggle with it for a bit, and then either:

1) succeed, and regret the messiness of the solution

or

2) fail, and find a non-awk way to handle it.

I really tried to like awk, but its portability hasn't been enough of a feature to raise it above other scripting languages for me. Especially if I'm going to end up in an editor

apienx · on June 29, 2023

Thanks for your work! Awk is a rabbit hole.

"Dark corners are basically fractal - no matter how much you illuminate, there is always a smaller but darker one." - - Brian Kernighan (quoted in the GNU Awk book)

binary_ninja · on June 29, 2023

Awk has always been a language that I loved but I have struggled to use besides quick jobs for parsing text files. I understand it is meant to be use for exactly that, but the fact that is simple, fast and lightweight sometimes makes me want to do something more with it, but when I start trying to do something besides parsing text I find that it starts becoming awkward (pun intented?).

usrbinbash · on June 29, 2023

> but the fact that is simple, fast and lightweight

I see awk as a DSL to be honest. Yes, it can be used as a general purpose language, but that quickly becomes, as you say, awkward :D

Like many DSLs, it is simple, fast and lightweight as long as it is used for it's intended purpose. Once you start using it for something else, these advantages evaporate pretty quickly, because then you have to essentially work around the DSL design to get it to do what you want.

snitty · on June 29, 2023

DSL == Domain Specific Language?

Rediscover · on June 29, 2023

coliveira · on June 29, 2023

One simple thing I do with awk is to create a command processor: read one line at a time and do things on my data as a response. This is very useful because you can make your command as powerful as needed and call other unix tools as a result.

rsolva · on June 29, 2023

Do you have an example of this that is available somewhere?

PhilipRoman · on June 29, 2023

I find it pretty nice for writing simple preprocessors. For example I have one which takes anything between two marker lines and pipes it through a command (one invocation per block). Awk has an amazing pipe operator which lets you do something like this:

    ... {
        print $0 | "command"
    }

"command" is executed once, and the pipe is kept open until closed explicitly by close("command"), at which point the next invocation will execute it again. The command string itself acts as a key for the pipe file descriptor.

And of course, no mention of awk is complete without the "uniq" implementation, which beats the coreutils uniq in every way possible (by supporting arbitrary expressions as keys and not requiring sorted input):

    !a[$0]++

aktau · on June 30, 2023

I had no idea about this "keep the pipe open" behaviour. I thought it would spawn the binary on every print statement and thus didn't consider it in the past. But now...

kqr · on June 29, 2023

This is exactly why I moved from AWK to Perl for these quick jobs a couple of years ago. If you stick to an AWK-like subset, Perl is also simple, fast and lightweight. If you want to grow your scripts (and you have a lot of discipline) Perl – in contrast to AWK – gives you enough noose to hang^W^W^W^Wthe tools you need.

joeythedolphin · on June 29, 2023

Perl? Wow. Is that better than bash, python or even nodejs? Why write in Perl over these? Serious question, was propaghandized to hate Perl.

IggleSniggle · on June 29, 2023

I write bash python and nodejs all day, and have no professional history with Perl.

One day while avoiding working on something important, I spent half a day learning Perl in order to implement something related to a build tool that was being used in the important thing I was avoiding.

I was blown away. It's a really delightful language. Its big downfall is that it makes it feel good to do something "clever."

Perl is a joy to write, and a devil to read. I liked it, and wish I had started my career earlier so I could have enjoyed Perl in its heyday.

I have similar feelings about Ruby.

gpvos · on June 29, 2023

You need to make sure that you write the clever bits clearly. Maybe add a comment. It takes some discipline, but isn't hard.

In fact, Perl remains remarkably robust if you stack clever tricks on top of each other.

tyingq · on June 29, 2023

The same shortcut syntax that people complain about does make perl really handy for one-time tasks where you're iterating on ideas. Lots of features there that make that easy. One example:

  #!/usr/bin/perl
  while (<>) {
      # various processing here
      # $ARGV is set to either "-" for piped input, or the current filename
      # $_ is the data of the current line
  }

That (<>) construct accepts data from stdin, redirection or file(s) named as arguments and iterates over the data. There's lots of things like that throughout the language.

jandrese · on June 29, 2023

And you can avoid even that minor boilerplate with the -n or -p flag. It even supports BEGIN and END like awk.

throw0101a · on June 29, 2023

> Perl? Wow. Is that better than bash, python or even nodejs? Why write in Perl over these?

It depends on scale.

If you have some quick parsing to do, then awk will get you started quickly, but as you expand your experimentation on what you want to extract/manipulate, it may not be easy to add onto the awk beginnings of your "one liner".

But if you start with awk-like† syntax but invoking it with Perl, then if you find you have to expand, Perl has more elbow room.

The intention is not to 'go big', which those other languages may be better at, but to more easily 'start small'.

† IIRC, Larry Wall wanted a utility that had awk/(s)ed-like syntax for text manipulation, just 'with more'.

throwawaaarrgh · on June 29, 2023

Have you ever tried to dig a hole? What tool did you use?

- Want to cut through and move loam, compost, sandy, and compacted soil? You're gonna want a rounded shovel.

- Want to break up rocky, clay soil? A pick mattock will penetrate deep, breaking up soil, shattering smaller rocks, and is used as a lever to uproot. A tiller is a faster method but disturbs the soil more.

- Want to dig a narrow, deep hole? An augur will quickly break up rocks and soil in a shaft and move them upwards.

What do you use the Perl tool for?

- Quickly and efficiently open files, read line by line, analyze text, and perform any kind of operation you can think of, with complex data structures, objects and modular code, using very few lines of code.

- Executing external commands with a shell, returning their output, and making complex yet short programs easily with arguments to the interpreter from a command line.

anthk · on June 29, 2023

Perl can do sh/awk/sed and a bunch more at once.

gpvos · on June 29, 2023

Absolutely. It is comparable to python in some ways, but makes it much easier to write quick one-liners using regexes and data manipulation, and to scale those up to real programs. It fills the gap between bash scripts using awk, grep and sed, and C/java/C#. Compared to bash scripting, perl is a real programming language. The documentation and library ecosystem are excellent, backwards compatibility is legendary, yet it supports modern Unicode. The syntax is weird, but try it for a bit, read the man pages, it's not that hard. The OO system is weirder, and I wouldn't make complex class hierarchies in it, but it is usable.

marttt · on June 29, 2023

I like how Awk is just a single executable. A single-executable Perl that includes only the core library would be great. There is Microperl [0, 1], but no idea how well it compiles with more up-to-date Perl versions.

0: https://github.com/bentxt/microperl-standalone

1: Original article from 2000 by the author Simon Cozens: https://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0003.html

Woeps · on June 29, 2023

Perl better? maybe or maybe not.

It can be very useful and they are pretty robust. I often found Perl scripts running for years and years without issues at different companies.

My main issue with Perl-scripts is that they often are not "readable" by anybody but the original creator. Which of course left the company. (not a fault of Perl itself tough)

But your millage may vary and any script can be made (un)readable.

jandrese · on June 29, 2023

I've always found it weird that people bash on Perl relentlessly for being hard to read and then turn around and praise Rust's syntax when it is full of stuff like this:

    fn print_d(t: &'static impl Display) {

thesuperbigfrog · on June 29, 2023

>> My main issue with Perl-scripts is that they often are not "readable" by anybody but the original creator.

Anyone writing Perl scripts like this should not be trusted with any programming language.

Perl scripts are no less readable than bash scripts or Awk scripts. This is because so much of Perl was written to do the same work as bash, awk, sed, and the other related Unix text processing command line programs, but all under one roof.

Don't believe me? Take a look for yourself:

https://learn.perl.org/

http://blob.perl.org/books/impatient-perl/iperl.htm

ilovecurl · on June 29, 2023

Perl can also be hilariously unreadable: https://www.foo.be/docs/tpj/issues/vol4_3/tpj0403-0017.html

thesuperbigfrog · on June 29, 2023

>> Perl can also be hilariously unreadable: https://www.foo.be/docs/tpj/issues/vol4_3/tpj0403-0017.html

Most programming languages can be obfuscated. That does not mean people write code in those programming languages like that:

C: https://www.ioccc.org/

Javascript: view-source:https://www.google.com/

The truth is that insulting Perl is considered stylish by some, so many people do despite knowing little to nothing about Perl and having never used it.

However, if you want Perl to be hilariously unreadable, why not write it in Latin:

https://metacpan.org/dist/Lingua-Romana-Perligata/view/lib/L...

Or Klingon:

https://metacpan.org/pod/Lingua::tlhInganHol::yIghun

anthk · on June 29, 2023

[flagged]

wott · on June 29, 2023

It's like when we, Gen-X ers, were repeating bad stuff about COBOL without having seen a single line of it.

Then I saw a real COBOL program and... well... it was even worse than what I had imagined :-)

anthk · on June 30, 2023

Perl has both oneliners/spaggethi code and games like Pangzero.

theonemind · on June 29, 2023

There's a limited problem domain where it's unquestionably the best. Perl beats awk and bash at their own game on their home turf. That's the best way to put it. It's faster, has more shortcuts, less warts, more power, and more readability when well written, and while aged and not huge by modern standards, CPAN (like pypi or npm) is incredible for a hyper-powered awk and bash mash-up for those tasks at the edge of of that limited problem domain. It's installed almost everywhere, so almost always available.

That stuff is just awkward and painful in Python by comparison.

j1elo · on June 29, 2023

I don't write Perl code, but its CLI has been a very good way to replace sed with something decent. sed not supoorting Perl regex syntax, the most commonly kind of regex out there by large, is frankly disappointing. Even grep was able to put it together and add the -P switch. But sed is still stuck in the prehistoric syntax of ERE ("Extended Regular Expressions", as described in man pages) which e.g. instead of \d for a digit, use [[:digit:]], a syntax present in... zero? other tools or programming environments.

sigzero · on June 29, 2023

Better than BASH? Mostly. Better than Python, subjective as you would have to use them both yourself. I lean towards Perl as I like sigils to denote things. I have nothing against Python though. Both are typically installed as a default now. I have never used nodejs for sys admin work.

bandrami · on June 29, 2023

Perl is super-specialized at reporting (that's in fact the "r" in Perl). In particular there's a bunch of extremely useful implicitly defined variables that take their context from your place in a line-by-line loop through a text file.

ilc · on June 29, 2023

Perl is a great language, but please listen to this old perl programmer's advice:

1. You can write totally unreadable perl. It is probably the single worst language in this regard most programmers will run into. Be careful to make your code readable.

2. Keep your amount of perl small. 200-300 lines is a good bit of it.

So for quick bang it out scripts that want to parse text etc... perl is great. For writing a major application, not so much.

SoftTalker · on June 29, 2023

One other advantage is that Perl will be found in the base install of almost any unix-like system. Python, nodejs, even bash may not.

radiator · on June 29, 2023

When discussing such languages, I would like to point out that Raku is also an option.

tripflag · on June 29, 2023

I have found a handful of unconventional applications for awk -- I once needed a tiny pcm pulsewave generator, and awk was surprisingly decent for the job [1].

Aside from that I've mostly been using it for quick statistics [2], but it quickly moves into perl territory...

1: https://github.com/9001/asm/blob/hovudstraum/etc/bin/beeps#L...

2: https://ocv.me/doc/unix/oneliners/#965bfcb8

bluetomcat · on June 29, 2023

It's a language for creating quick alternative views from line- and column-oriented text streams. That means, take the output of another tool and represent it in a different way.

asicsp · on June 29, 2023

I use awk mostly for one-liners and resort to Python when I need more than a few lines of code.

MikeTheGreat · on June 29, 2023

Ok, dumb question: Is the link supposed to link to the actual book (i.e., is the book free and/or open source) or is this just a page of miscellaneous interesting links about the book (which we can pay for, later, when it's published).

I was expecting the book, but the page itself says "This page is a placeholder for material related to the second edition of The AWK Programming Language."

It's fine if this is a placeholder page (and an awesome excuse to read talk about AWK here on HN :) ) but I want to be sure that I'm not missing the book itself.

RGBCube · on June 29, 2023

What I understand from the page is that the Second Edition of the book will reside in the page when it is released (the reason why it says it is a "placeholder").

_ph_ · on June 29, 2023

I think the page description is quite clear: it contains material related to the book. Not the book itself. So I would guess all downloadable code and perhaps supplementory material.

ineedasername · on June 29, 2023

Amazing, takes me back.

~

One of my first big projects at my first job fresh out of college was using sed & awk to semi-automate the transformation of semi-unstructured data into a database.

IIRC I couldn't completely automate because it contained author names, from global naming conventions. (parsing names correctly is deceptively complex) They had somewhat arbitrary #'s of initials ranging from 0-3.

Again, IIRC, I could easily accommodate 0 or 1 initial (followed by \.) but trying for more would make the regex I was using too greedy and pull in part of the article abstract. These were scientific books and journals.

So I scripted a sed & awk program to detect the possibility of > 1 initials and when that occured, I'd pipe the record into nano for a quick review where I manually inserted the correct \. characters for the initials.

It was decades of back-catalogue publications for digitization so I sat there for days, listening to music on an original 1st gen iPod, waiting for my duct-taped kludge of a program to pipe one of thousands of records into a nano session every few minutes. This was on an Apple G4 workstation running OS X, where I earned my real bash scripting chops. It was an awful hack by today's standards, but at the time, accomplishing what was expected to be a 1-year long project in ~1 month, it was seen as nearly miraculous.

getpost · on June 29, 2023

I know lots of people like awk, but I pretend it doesn't exist. Why? Here's my comment on this from 6 years ago[0],

>I used awk until I learned Python (long ago). For me, awk was yet another example of the "worse is better" approach to things so common in unix. For example, if you make a syntax error, you might get a message like "glob: exec error," rather than an informative message. "Worse is better" is probably a good strategy in business and for getting things done, but still, mediocrity and the sense of entitlement that so often goes with carelessness, sickens me.

[0] https://news.ycombinator.com/item?id=13457265

Long live the Unix Hater's Handbook! (Unix is fine, and so are the criticisms herein. Some of these criticisms have been eclipsed by ongoing development.) https://en.wikipedia.org/wiki/The_UNIX-HATERS_Handbook

ghshephard · on June 29, 2023

You are missing out. As a former data engineer/current SRE, I spend my entire day with VSCode/Python/Notebooks/CoPilot banging out python code - but whenever I need to do a complex analysis of a semistructured text file in < 60 seconds, awk is my twitch reflex tool. It can trivially do state transition based on patterns in the file, as well as populate hashes from one file and use them in analysis of the next file in just a few characters.

Awk's claim to fame in my world is that it's cognitive activation energy for anyone who has taken the 3-4 hours to learn the language from start to finish (and that's the awesome thing about the language - it really is about 3 hours of concentrated attention) - is essentially nil. You see a bunch of ugly not really structured text 500 MB files that you can't pull into pandas, or easily parse into python dicts? No problem - awk will tear through them for you and get the information you want in < 60 seconds, including the time you took to write your (almostl always single line) of code.

That's Awk's sweet spot.

getpost · on June 29, 2023

Point taken. I have a Python program that is an elemental version of awk, and I use that for the odd task. I can modify it if needed and I have the entire Python library to help me. Is the text Unicode? HTML? These little details matter.

I'm not complaining that someone banged out awk (speaking figuratively) on a Friday afternoon to do something and not have to stay after work. Excellent! My complaint is that the failure to address technical debt has negatively affected the productivity of millions, if not tens of millions, of people, often working under pressure, for DECADES.

ghshephard · on June 29, 2023

I'm not sure what technical debt you are referring to. Awk is designed to do one very simple job, and it does so using a language that I can usually teach to new SREs in < 2 Hours with 9-10 follow up tasks that drill in their understanding.

It's benefited from extraordinarily enlightened stewardship, kept it's minimalism and strengths, and will finally get a key enhancement (UTF-8 support).

The first edition manual is probably the greatest example I've ever seen of technical writing as well.

classichasclass · on June 29, 2023

In general Perl fits that niche for me better, but sometimes awk is what you have.

pmarreck · on June 29, 2023

I will bet you $1000 that time spent learning Awk will lead to better results much faster than time spent polluting your privileged user directories with Python's excuse for "dependency management"

paleface · on June 30, 2023

I agree entirely!

For many python users, it’s the only language they know. Often, they see programming in python, as part of their “identity” - so they’re overly invested in it, to the detriment of other wonderful languages, like awk.

I used to code perl myself, back in the day - but I came to appreciate the simplicity of awk, and now it’s one of my favourites. I no longer code perl, as a consequence, as I believe awk to be far more elegant! I wouldn’t have done so, if I was overly invested in being a “perl programmer”.

momentoftop · on June 29, 2023

Specifically, Awk is a good solution to a problem that should never have existed in the first place. Why am I having to write these bespoke parsers for the random mess of output formats that you get from the UNIX command line?

Well, the fact is that I have to write such parsers. That's very sad, but has no chance of being fixed. So it's good to know Awk.

I think Erik Naggum had this exact criticism of Perl.

technofiend · on June 29, 2023

Seems like the best time to ask since this is an awk thread: if anyone has a line on the original artwork or a source for the awk t-shirt please let me know. From memory it's of a gangly bird jumping / parachuting from an airplane (DC3?) and captioned with awk's infamous catch-all error message: "Awk: bailing out near line one".

bsdooby · on June 29, 2023

Currently looking @ alternatives (not that I dislike AWK, far from it):

Tokay: https://github.com/tokay-lang/tokay

frawk: https://github.com/ezrosent/frawk

lambertsimnel · on June 29, 2023

Have you considered tab?

https://tkatchev.bitbucket.io/tab/

sgu999 · on June 29, 2023

What do you think of them? Tokay in particular looks very polished.

bsdooby · on June 29, 2023

TBH: no conclusion yet (did not find time ATM to try it out in fill detail)...sorry

geophile · on June 29, 2023

Take a look at marcel: https://github.com/geophile/marcel

asicsp · on June 29, 2023

Have to wait, as "The book will be available by the end of September"

See https://hn.algolia.com/?q=The+AWK+Programming+Language for discussion on the first edition

Didn't know there was a list of `awk` implementations: https://www.gnu.org/software/gawk/manual/html_node/Other-Ver...

kar1181 · on June 29, 2023

One of the first utilities I had to get to grips with way back was awk, and it serves me well to this day. Best bang for buck investment of time in my entire career. Even today I still use some variant of awk -F(x) '{print $x}'.

radiator · on June 29, 2023

This is good news, because you have to pay a lot for a used copy of the first edition nowadays. I hope the spirit remains the same as in the first edition.

threeio · on June 30, 2023

I read the first edition so many times as a young kid... AWK was just such a cool name when I would go to the library and grab a book out of the stacks trying to learn something new.

DarkNova6 · on June 29, 2023

I don't know about Awk, but I feel the urge to write a library named "ward" for it.

schoen · on June 29, 2023

Maybe the person who deals with security issues for an awk implementation could be called the awk ward.

ducktective · on June 29, 2023

Also watch his recent interview on Computerphile: https://www.youtube.com/watch?v=GNyQxXw_oMQ

And: Brian Kernighan adds Unicode support to Awk https://news.ycombinator.com/item?id=32534173

throw0101a · on June 29, 2023

With Lex Friedman from ~2 years ago:

* https://www.youtube.com/watch?v=O9upVbGSBFo

bardak · on June 29, 2023

Honestly after watching a lot of Kernighan interviews and reading his original book on C he is a very great communicator. I wonder how different the software world would have been without him at Bell Labs. Would Unix and C have become as widely used as quickly?

hi41 · on June 30, 2023

Kernighan is a national treasure and a world treasure!

penguin_booze · on June 29, 2023

I wish awk had support for addressing a range of fields, like from $1 to $7. `cut` supports it, FWIW.

mplanchard · on June 29, 2023

You can always loop through the fields, but it’s a little messy, especially for one-liners

penguin_booze · on June 29, 2023

Yes, that's an option. The range lookup is an ergonomic feature. Imagne what it would have been like, if we couldn't do foo[-3:] in Python.

pmarreck · on June 29, 2023

Awk is old but great, designed to chew through lines of text files with ease, and has great defaults that minimize the amount of awk code you actually have to write to do anything. It's underrated.

ahalbert · on June 29, 2023

I love using Awk, the only thing I miss is that it can't handle complex csv files. Does anyone know how to handle quoted CSV strings like

> "foo","bar,baz"

geophile · on June 29, 2023

I like the idea of Unix pipelines, but I hate all the sublanguages, awk being one of the biggest. I scratched my itch and built my own shell, marcel: https://github.com/geophile/marcel.

I mention this specifically, here, because of the CSV point. Marcel handles CSV, e.g. "read --csv foobar.csv" reads the foobar.csv file, parses the input (getting quotes and commas correct), and yields a stream of Python tuples, splitting each line of the CSV into the elements of the output tuples.

Marcel also supports JSON input, translating JSON structures into Python equivalents. (The "What's New" section of marcel's README has more information on JSON support, which was just added.)

asicsp · on June 29, 2023

If quoted string is the only thing you need to handle extra (i.e. no escaped quotes, newlines, etc) and if you have GNU awk:

    $ echo '"foo","bar,baz"' | awk -v FPAT='"[^"]*"|[^,]*' '{print $1}'
    "foo"
    $ echo '"foo","bar,baz"' | awk -v FPAT='"[^"]*"|[^,]*' '{print $2}'
    "bar,baz"

For a more robust solution, see https://stackoverflow.com/q/45420535 or use other tools like https://github.com/BurntSushi/xsv

poetaster · on June 29, 2023

I wanted to ask why not the more simple form:

echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $1}' "foo

echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $2}' bar,baz

echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $3}' boo"

Realizing that I have to strip the quotes that remain.

Edit. formatting.

EDit, again, from your link, the following is more terse and too my taste (still needs strips):

awk -v FPAT='("[^"]*")+'

wmwragg · on June 29, 2023

I usually use this awk function to parse CSV in awk:

    # This function takes a line i.e. $0, and treats it as a line of CSV, breakin
    # it into individual fields, and storing them in the passed in field array. It
    # returns the number of fields found, 0 if none found. It takes account of CSV
    # quoting, and also commas within CSV quoted fields, but doesn't remove them
    # from the parsed field.
    # use in code like:
    #   number_of_fields = parse_csv_line($0, csv_fields)
    #   csv_fields[2]  # get second parsed field in $0
    function parse_csv_line(line, field,   _field_count) {
      _field_count = 0
      # Treat each line as a CSV line and break it up into individual fields
      while (match(line, /(\"([^\"]|\"\")+\")|([^,\"\n]+)/)) {
        field[++_field_count] = substr(line, RSTART, RLENGTH)
        line = substr(line, RSTART+RLENGTH+1, length(line))
      }
      return _field_count
    }

It's not perfect but gets the job done most of the time and works across all awk implementations.

JdeBP · on June 29, 2023

Convert it with Miller first:

    mlr --icsv --otsv cat examplefile

* https://miller.readthedocs.io/en/latest/10min/

dbro · on June 29, 2023

Yes, this is what csvquote does. It does nothing else, just this so that programs like awk, sed, cut, etc. can work properly.

https://github.com/dbro/csvquote

lysium · on June 29, 2023

They are planning built-in support for that, see that other comment https://news.ycombinator.com/item?id=36518146

csours · on June 29, 2023

I FINALLY started learning awk in the past couple weeks. I think I was intimidated because awk can be very terse, and there are some default actions that aren't clear when you first start looking at awk scripts.

My other problem is that I want to accomplish things, not learn a tool, and it generally takes me a bit longer than it should to decide to actually learn something and not just hack at it.

Is it still worth it to be "the awk guy" at work?

simmonmt · on June 29, 2023

yes, because you'll be done with your thing before others figure out how to lay out your spreadsheet. also your solution will be reusable.

(based on my experience where people who could've benefited from awk for a one-liner dependably reach for sheets/excel rather than something like python or perl)

fgh · on June 29, 2023

Who wrote the second edition?

Lyngbakr · on June 29, 2023

I read a comment on HN the other day by someone reviewing the book and I believe they said it was Brian Kernigan.

fuzztester · on June 29, 2023

It was mentioned recently here in another HN thread that Brian Kernighan is writing it.

B1FF_PSUVM · on June 29, 2023

The lowercase 'bwk' used in the text makes me believe that ...

rochak · on June 29, 2023

I wish I use awk all the time but everytime I use it the knowledge I gain doesn't stick. Could be due to its arcane syntax which is just too hard for me to remember.

dzogchen · on June 29, 2023

Wow, hyped for this.

I picked up this little book from my University library once, and it was a fantastic read.

v3ss0n · on June 30, 2023

Awk was great at its time, but when you need at write more than 5 lines of awk code please consider using python since

1. It is a lot faster than awk/perl/grep/sed combos

2. Way a lot readable and maintainable

3. More powerful than awk with it's string functionalities

4. Same availability as awk in OSs since last decade

proger · on June 29, 2023

Find and AWK together, a match made in heaven. Thanks for the link.

lkuty · on June 29, 2023

do you have some resources regarding the use of awk with find ?

jhoechtl · on June 29, 2023

I love the csv-mode. It obviously takes some time

sigzero · on June 29, 2023

I am looking forward to this coming out.

shaftoe444 · on June 29, 2023

Can I preorder this?

andrewstuart · on June 29, 2023

Awk and ChatGPT are best friends.

magarnicle · on June 29, 2023

Yeah this solves the "I don't use it enough to remember it problem". ChatGPT eliminates the first hurdle of using it, so I'm likely to use it more, and then hopefully it will start to stick.

ketanmaheshwari · on June 29, 2023

How so?

andrewstuart · on June 29, 2023

Ask ChatGPT to write your awk scripts - it does a prettyy damn good job at a first pass.