Hacker News new | past | comments | ask | show | jobs | submit login
The Awk Programming Language, Second Edition (awk.dev)
554 points by 0x54MUR41 on June 29, 2023 | hide | past | favorite | 155 comments



I was privileged to be one of the technical reviewers for this book. There's a fair bit of the original content (which is still great), but Kernighan's done a great job with some good restructuring and some significant updates, too. The early chapters are very hands-on, with something of a focus on "exploratory data processing", particularly with CSV files. Big data with AWK, you could say.

Gawk and awk will soon have a new "--csv" option that enables proper CSV input mode (parsing files with quoted and multiline fields per the CSV RFC). I'm really glad Arnold Robbins added a robust "--csv" implementation to Gawk, too, because that's really the most-heavily used version of AWK nowadays. I've already got CSV support in my own GoAWK implementation, and I'll be adding "--csv" to make it compatible.

I'm really glad this new updated version is coming out!


Its a crying shame we never settled on a control character separated text format. There's a ascii control characters for record and field (unit) separators. A bit of user space support for that would have been great.


As I recall, you can tell Awk to use the control characters as record and field separators. Not helpful if you're getting your data from others, but if you're working by yourself, you have the option. I've come to use control characters as a default because it makes life so much easier.


What do you recommend for viewing and editing such files?


Visidata works with arbitrary separators. I just tried with a CSV separated with ␟ (ASCII unit separator) and it worked just fine.


Excel too?


lolive, VisiData has some Excel support. However, don't expect VisiData to be a full blown editor for Excel files. It can provide a view of the data in an Excel spreadsheet.


I think GP means to ask if Excel can read files with ASCII 29,30,31 separators.


If you have a python installation available, openpyxl[1] is great both for converting to .csv and for packaging .csv outputs as .xlsx (which is really zipped .xml, anyway).

[1] https://openpyxl.readthedocs.io/en/stable/index.html


It is a shame. I have been using tab-separated sheets recently as it allows me to simply not care about almost any possible character in my strings...apart from tabs of course. But those are far less common than commas, and putting strings in quotes 100% of the time looks messy to me.


Way less common would be using ascii 30 and ascii 31. ascii 29 and you can cram multiple datasets into one file


Some discussion of that here: https://news.ycombinator.com/item?id=31220841

To be really useful as a format it would just need for text editors to: -display something distinct for the field separator (some editors do this) -treat the record separator character like a carriage return (not aware of any editors that do this)


>To be really useful as a format it would just need for text editors to: -display something distinct for the field separator

Which would be trivial too.


The programming might be straightforward. Trying to persuade the product owners to do it is a different matter.


Pull request?


Yeah, that would work /s


After you. ;0)


> To be really useful as a format it would just need for text editors to

This made me think of WordPerfect's "reveal codes" functionality. :)

(Word's "Reveal Formatting" is supposedly similar.)


Right. The issue is the user space support at the end of the day.


Tab-delimited "csv" formats are quite common (e.g. the CONLL format family for many natural language processing tasks) and also supported by common tools such as MS Excel for decades already.



Most important comment I have ever read on HN ever !


> Gawk and awk will soon have a new "--csv" option that enables proper CSV input mode

Awesome!!!! Super excited to see this!


Awk is really great, for those knowing nvm [1], I used awk to make `nvm ls-remote` run more than 10 times faster [2] by replacing the related shell script with around 60 lines of awk script [3], and I was quite happy with the improvement.

It's not really a one-liner, neither something big, but one can take that as an example regarding that awk is really not just for one-liners.

Meanwhile having `--csv` support is really nice. I'd also like to see things like a builtin `length` function to be standard.

[1]: https://github.com/nvm-sh/nvm/ [2]: https://github.com/nvm-sh/nvm/pull/2827/ [3]: https://github.com/nvm-sh/nvm/blob/9a769630d7/nvm.sh#L1703-L...


But length() is standard POSIX, no? Even length(array) has been approved by POSIX [1] but not yet included in the spec (they're very slow to update the spec for some reason). Both forms have been supported in onetrueawk, Gawk, mawk, and Busybox awk for a long time.

[1] https://www.austingroupbugs.net/view.php?id=1566


Ah good to know, thank you! Meanwhile I found it, it was the awk from Debian 10 that lacked `length(array)`, see the related nvm pr [1].

[1]: https://github.com/nvm-sh/nvm/pull/2917/


Our data product is delivered in CSV format. Even though I create user documentation mainly using csvkit, grep and sed, I would love to convert all those solutions to AWK. Sometimes AWK is more readable than sed and csvkit requires installation.

It will be nice to have a awk cookbook for CSV. In terms of CSV maniupulation and querying there is only a limited number of operations and I think there is potential to standardize those operation using AWK.


Ben is not just any old technical reviewer. He wrote a version of AWK in go and has done a ton of other work in the AWK ecosystem.


Thanks! ICYMI: GoAWK [1] - A POSIX-compliant AWK interpreter written in Go, with CSV support.

[1]: https://github.com/benhoyt/goawk


And it can be embedded in your Go programs as well.


It's nice that everyone is supporting this, I've written a portable awk module that takes control of the parsing and it is SLOW (and a little buggy). I'm a little bummed that nobody will use it but this is truly a step in the right direction.

I guess for the people that are still using nawk, you can set up an AWK envvar so you can { awk -f $AWKU/ucsv.awk -f <(echo '{print NR, $1}') }

https://github.com/Nomarian/Awk-Batteries/blob/master/Units/...


Would you say the first few chapters are enough to get the 75-80% usefulness for mere mortals like me who will never try to master the full language? Or is the material fairly sprinkled throughout the whole tome?


Yes, definitely. The first three chapters would be more than enough for that: 1) An Awk Tutorial, 2) Awk in Action, and 3) Exploratory Data Analysis. For most people who just want to use AWK for one-liners on the command line, you can stop there. The rest of the chapters are about writing larger (still small! but not one-liner) programs in AWK to create reports, little languages, and experiment with algorithms.


Thanks!


Fantastic news. I’ve tried lots of new CLI tools but they always seem to fall between too little functionality (eg. xsv) and too much (VisiData). AWK is just right.


This is amazing, I may never use pandas again


Awk is awesome! Glad that they are looking to modernize the book. It wasn't really necessary, all the code examples in the original edition of the book still run just fine, although some are somewhat dated, like printing ASCII bar graphs. They also had examples of writing VMs, parsers and interpreters in the book, which run on modern implementations.[0]

The language has some quirks. To declare temporary variables, it's common practice to add extra arguments to functions that won't be used. And traversal of associative arrays is implementation-dependent. I'm not sure what the situation is regarding locale and UTF-8 support.

EDIT: Looks like Brian Kernighan added Unicode support last year.[1]

[0] https://github.com/siraben/awk-vm/blob/master/vm.awk

[1] https://github.com/onetrueawk/awk/commit/9ebe940cf3c652b0e37...


What would you suggest as an alternative to printing ASCII bar graphs? I do that all the time. Takes 20 seconds and often makes distributions, modalities, and patterns over time obvious right away.


`sparklines`[1] is good for an overall low-res view. `termgraph`[2] is sometimes better for a higher-res, more capable view (but can be finicky about the data.)

[1] https://github.com/deeplook/sparklines

[2] https://github.com/mkaz/termgraph


But both require depending on a third party library -- hardly something on a whim if ASCII bar charts do the job?


Sure but, e.g., sparklines can show me the shape of my 60 numbers[1] more effectively on a single line of 60 characters[2] than an ASCII bar chart which would be 60 lines (without binning).

[1] 27623 14272 22218 21267 19037 989 27116 32405 23261 27104 7793 9432 7776 28832 13521 10783 29261 32193 30367 20358 22611 2023 19607 9844 3516 6510 16533 8378 22986 17043 14628 13392 22799 23847 29212 23690 17779 17059 28211 26180 32061 22740 7911 12018 4508 9801 9578 15350 9554 15517 11112 405 22054 2743 26609 7843 713 10975 2830 1126

[2] http://rjp-hosted-files.s3.amazonaws.com/sparkline-demo.png


gnuplot is an alternative that is available on almost as many systems as awk, and can do the job as well

edit: this prompted me to write up a little note showing how: https://notes.billmill.org/visualization/graphs/gnuplot/A_ba...


If you do this sort of thing more than once ever, look at the feedgnuplot tool (http://github.com/dkogan/feedgnuplot). It'll make your life easier


Neat! Once you're installing something to do terminal plots though, https://github.com/red-data-tools/YouPlot looks the nicest I've seen.

(The nice feature of feedgnuplot of course is that you can _also_ render the plots to images, which youplot can't)


> which youplot can't

But you could feed them through `textimg`[1] to generate PNGs.

[1] https://github.com/jiro4989/textimg


lol, I love it


Is there a particular benefit in writing a VM in AWK, placed in a big BEGIN block? Very similar code can be written in Perl or Python. Isn't the strength of AWK in its line-matching capability, being able to pattern-match a line against a block of code?


> Is there a particular benefit in writing a VM in AWK

Not really. Later on the book just ran out of line-matching examples to go through and started doing regular programming instead :P. When I actually write AWK code I rely on line-matching and using a variable to handle state.


At the time, awk was the only scripting language (other than shell) generally available on Unix systems. Perl, Tcl, Python didn't exist yet. So awk was often used for general-purpose programming.


AWK runs everywhere. Perl and Python do not.

Busybox has their own independent AWK implementation.

https://busybox.net/ https://frippery.org/busybox/

Also see the first edition of the AWK manual online here:

https://archive.org/details/pdfy-MgN0H1joIoDVoIC7


I'm sorry, where does Awk run but Perl and Python don't?

If we're counting minimalist implementations, there's micropython that even runs on microcontrollers that cost a less than 2 dollars.


The POSIX standard mandates awk.

It does not mandate Perl or Python.

<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/a...>

<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/c...>

There are many systems which lack Perl or Python, but include awk.

You might be carrying an Android device at the moment --- if you drop to its default userland, that provides a bunch of utilities, including awk, via Busybox. But not, so far as I'm aware, either Perl or Python.

(You can of course install Termux which will then give you both Perl and Python, along with Node.js, ruby, and a whole slew of other scripting and compiled languages. But so long as we're considering stock installs, it's sed and awk.)


Small, embedded, busybox systems - per the parent comment.


Like I said, MicroPython can run even on microcontrollers.


I think parent may be more implying “living off the land” situations. Awk is ubiquitous, and part of the POSIX standard. Micro Python, not so much.

I’m a big fan of Forth, but it’s not available everywhere. One has to adapt.


I love telling about that example to my programming language friends.

> Hey you should read the AWK book, it even says how to write a VM!

> Why would I ever want to use AWK for that?

> Well, the input is a text file with one space-delimited instruction per line.

> Hmm... You have a point.


On wm's, why not a Z-machine? It's ideal for this.


I love awk. It’s everywhere and every time I am writing a shell script and work myself into a corner, awk has been the way out.

I know exactly enough to be dangerous and have meant to deep dive for almost a decade.


awk can be mastered by just reading the man page. The book doesn't take long to read either. Once you understand the simple principles, you can write an infinite number of scripts for all kinds of tasks.


See, when I'm writing a shell script interactively and work myself into a corner, I reach for awk, struggle with it for a bit, and then either:

1) succeed, and regret the messiness of the solution

or

2) fail, and find a non-awk way to handle it.

I really tried to like awk, but its portability hasn't been enough of a feature to raise it above other scripting languages for me. Especially if I'm going to end up in an editor


Thanks for your work! Awk is a rabbit hole.

"Dark corners are basically fractal - no matter how much you illuminate, there is always a smaller but darker one." - - Brian Kernighan (quoted in the GNU Awk book)


Awk has always been a language that I loved but I have struggled to use besides quick jobs for parsing text files. I understand it is meant to be use for exactly that, but the fact that is simple, fast and lightweight sometimes makes me want to do something more with it, but when I start trying to do something besides parsing text I find that it starts becoming awkward (pun intented?).


> but the fact that is simple, fast and lightweight

I see awk as a DSL to be honest. Yes, it can be used as a general purpose language, but that quickly becomes, as you say, awkward :D

Like many DSLs, it is simple, fast and lightweight as long as it is used for it's intended purpose. Once you start using it for something else, these advantages evaporate pretty quickly, because then you have to essentially work around the DSL design to get it to do what you want.


DSL == Domain Specific Language?


Yes


One simple thing I do with awk is to create a command processor: read one line at a time and do things on my data as a response. This is very useful because you can make your command as powerful as needed and call other unix tools as a result.


Do you have an example of this that is available somewhere?


I find it pretty nice for writing simple preprocessors. For example I have one which takes anything between two marker lines and pipes it through a command (one invocation per block). Awk has an amazing pipe operator which lets you do something like this:

    ... {
        print $0 | "command"
    }
"command" is executed once, and the pipe is kept open until closed explicitly by close("command"), at which point the next invocation will execute it again. The command string itself acts as a key for the pipe file descriptor.

And of course, no mention of awk is complete without the "uniq" implementation, which beats the coreutils uniq in every way possible (by supporting arbitrary expressions as keys and not requiring sorted input):

    !a[$0]++


I had no idea about this "keep the pipe open" behaviour. I thought it would spawn the binary on every print statement and thus didn't consider it in the past. But now...


This is exactly why I moved from AWK to Perl for these quick jobs a couple of years ago. If you stick to an AWK-like subset, Perl is also simple, fast and lightweight. If you want to grow your scripts (and you have a lot of discipline) Perl – in contrast to AWK – gives you enough noose to hang^W^W^W^Wthe tools you need.


Perl? Wow. Is that better than bash, python or even nodejs? Why write in Perl over these? Serious question, was propaghandized to hate Perl.


I write bash python and nodejs all day, and have no professional history with Perl.

One day while avoiding working on something important, I spent half a day learning Perl in order to implement something related to a build tool that was being used in the important thing I was avoiding.

I was blown away. It's a really delightful language. Its big downfall is that it makes it feel good to do something "clever."

Perl is a joy to write, and a devil to read. I liked it, and wish I had started my career earlier so I could have enjoyed Perl in its heyday.

I have similar feelings about Ruby.


You need to make sure that you write the clever bits clearly. Maybe add a comment. It takes some discipline, but isn't hard.

In fact, Perl remains remarkably robust if you stack clever tricks on top of each other.


The same shortcut syntax that people complain about does make perl really handy for one-time tasks where you're iterating on ideas. Lots of features there that make that easy. One example:

  #!/usr/bin/perl
  while (<>) {
      # various processing here
      # $ARGV is set to either "-" for piped input, or the current filename
      # $_ is the data of the current line
  } 
That (<>) construct accepts data from stdin, redirection or file(s) named as arguments and iterates over the data. There's lots of things like that throughout the language.


And you can avoid even that minor boilerplate with the -n or -p flag. It even supports BEGIN and END like awk.


> Perl? Wow. Is that better than bash, python or even nodejs? Why write in Perl over these?

It depends on scale.

If you have some quick parsing to do, then awk will get you started quickly, but as you expand your experimentation on what you want to extract/manipulate, it may not be easy to add onto the awk beginnings of your "one liner".

But if you start with awk-like† syntax but invoking it with Perl, then if you find you have to expand, Perl has more elbow room.

The intention is not to 'go big', which those other languages may be better at, but to more easily 'start small'.

† IIRC, Larry Wall wanted a utility that had awk/(s)ed-like syntax for text manipulation, just 'with more'.


Have you ever tried to dig a hole? What tool did you use?

- Want to cut through and move loam, compost, sandy, and compacted soil? You're gonna want a rounded shovel.

- Want to break up rocky, clay soil? A pick mattock will penetrate deep, breaking up soil, shattering smaller rocks, and is used as a lever to uproot. A tiller is a faster method but disturbs the soil more.

- Want to dig a narrow, deep hole? An augur will quickly break up rocks and soil in a shaft and move them upwards.

What do you use the Perl tool for?

- Quickly and efficiently open files, read line by line, analyze text, and perform any kind of operation you can think of, with complex data structures, objects and modular code, using very few lines of code.

- Executing external commands with a shell, returning their output, and making complex yet short programs easily with arguments to the interpreter from a command line.


Perl can do sh/awk/sed and a bunch more at once.


Absolutely. It is comparable to python in some ways, but makes it much easier to write quick one-liners using regexes and data manipulation, and to scale those up to real programs. It fills the gap between bash scripts using awk, grep and sed, and C/java/C#. Compared to bash scripting, perl is a real programming language. The documentation and library ecosystem are excellent, backwards compatibility is legendary, yet it supports modern Unicode. The syntax is weird, but try it for a bit, read the man pages, it's not that hard. The OO system is weirder, and I wouldn't make complex class hierarchies in it, but it is usable.


I like how Awk is just a single executable. A single-executable Perl that includes only the core library would be great. There is Microperl [0, 1], but no idea how well it compiles with more up-to-date Perl versions.

0: https://github.com/bentxt/microperl-standalone

1: Original article from 2000 by the author Simon Cozens: https://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0003.html


Perl better? maybe or maybe not.

It can be very useful and they are pretty robust. I often found Perl scripts running for years and years without issues at different companies.

My main issue with Perl-scripts is that they often are not "readable" by anybody but the original creator. Which of course left the company. (not a fault of Perl itself tough)

But your millage may vary and any script can be made (un)readable.


I've always found it weird that people bash on Perl relentlessly for being hard to read and then turn around and praise Rust's syntax when it is full of stuff like this:

    fn print_d(t: &'static impl Display) {


>> My main issue with Perl-scripts is that they often are not "readable" by anybody but the original creator.

Anyone writing Perl scripts like this should not be trusted with any programming language.

Perl scripts are no less readable than bash scripts or Awk scripts. This is because so much of Perl was written to do the same work as bash, awk, sed, and the other related Unix text processing command line programs, but all under one roof.

Don't believe me? Take a look for yourself:

https://learn.perl.org/

http://blob.perl.org/books/impatient-perl/iperl.htm


Perl can also be hilariously unreadable: https://www.foo.be/docs/tpj/issues/vol4_3/tpj0403-0017.html


>> Perl can also be hilariously unreadable: https://www.foo.be/docs/tpj/issues/vol4_3/tpj0403-0017.html

Most programming languages can be obfuscated. That does not mean people write code in those programming languages like that:

C: https://www.ioccc.org/

Javascript: view-source:https://www.google.com/

The truth is that insulting Perl is considered stylish by some, so many people do despite knowing little to nothing about Perl and having never used it.

However, if you want Perl to be hilariously unreadable, why not write it in Latin:

https://metacpan.org/dist/Lingua-Romana-Perligata/view/lib/L...

Or Klingon:

https://metacpan.org/pod/Lingua::tlhInganHol::yIghun


[flagged]


It's like when we, Gen-X ers, were repeating bad stuff about COBOL without having seen a single line of it.

Then I saw a real COBOL program and... well... it was even worse than what I had imagined :-)


Perl has both oneliners/spaggethi code and games like Pangzero.


There's a limited problem domain where it's unquestionably the best. Perl beats awk and bash at their own game on their home turf. That's the best way to put it. It's faster, has more shortcuts, less warts, more power, and more readability when well written, and while aged and not huge by modern standards, CPAN (like pypi or npm) is incredible for a hyper-powered awk and bash mash-up for those tasks at the edge of of that limited problem domain. It's installed almost everywhere, so almost always available.

That stuff is just awkward and painful in Python by comparison.


I don't write Perl code, but its CLI has been a very good way to replace sed with something decent. sed not supoorting Perl regex syntax, the most commonly kind of regex out there by large, is frankly disappointing. Even grep was able to put it together and add the -P switch. But sed is still stuck in the prehistoric syntax of ERE ("Extended Regular Expressions", as described in man pages) which e.g. instead of \d for a digit, use [[:digit:]], a syntax present in... zero? other tools or programming environments.


Better than BASH? Mostly. Better than Python, subjective as you would have to use them both yourself. I lean towards Perl as I like sigils to denote things. I have nothing against Python though. Both are typically installed as a default now. I have never used nodejs for sys admin work.


Perl is super-specialized at reporting (that's in fact the "r" in Perl). In particular there's a bunch of extremely useful implicitly defined variables that take their context from your place in a line-by-line loop through a text file.


Perl is a great language, but please listen to this old perl programmer's advice:

1. You can write totally unreadable perl. It is probably the single worst language in this regard most programmers will run into. Be careful to make your code readable.

2. Keep your amount of perl small. 200-300 lines is a good bit of it.

So for quick bang it out scripts that want to parse text etc... perl is great. For writing a major application, not so much.


One other advantage is that Perl will be found in the base install of almost any unix-like system. Python, nodejs, even bash may not.


When discussing such languages, I would like to point out that Raku is also an option.


I have found a handful of unconventional applications for awk -- I once needed a tiny pcm pulsewave generator, and awk was surprisingly decent for the job [1].

Aside from that I've mostly been using it for quick statistics [2], but it quickly moves into perl territory...

1: https://github.com/9001/asm/blob/hovudstraum/etc/bin/beeps#L...

2: https://ocv.me/doc/unix/oneliners/#965bfcb8


It's a language for creating quick alternative views from line- and column-oriented text streams. That means, take the output of another tool and represent it in a different way.


I use awk mostly for one-liners and resort to Python when I need more than a few lines of code.


Ok, dumb question: Is the link supposed to link to the actual book (i.e., is the book free and/or open source) or is this just a page of miscellaneous interesting links about the book (which we can pay for, later, when it's published).

I was expecting the book, but the page itself says "This page is a placeholder for material related to the second edition of The AWK Programming Language."

It's fine if this is a placeholder page (and an awesome excuse to read talk about AWK here on HN :) ) but I want to be sure that I'm not missing the book itself.


What I understand from the page is that the Second Edition of the book will reside in the page when it is released (the reason why it says it is a "placeholder").


I think the page description is quite clear: it contains material related to the book. Not the book itself. So I would guess all downloadable code and perhaps supplementory material.


Amazing, takes me back.

~

One of my first big projects at my first job fresh out of college was using sed & awk to semi-automate the transformation of semi-unstructured data into a database.

IIRC I couldn't completely automate because it contained author names, from global naming conventions. (parsing names correctly is deceptively complex) They had somewhat arbitrary #'s of initials ranging from 0-3.

Again, IIRC, I could easily accommodate 0 or 1 initial (followed by \.) but trying for more would make the regex I was using too greedy and pull in part of the article abstract. These were scientific books and journals.

So I scripted a sed & awk program to detect the possibility of > 1 initials and when that occured, I'd pipe the record into nano for a quick review where I manually inserted the correct \. characters for the initials.

It was decades of back-catalogue publications for digitization so I sat there for days, listening to music on an original 1st gen iPod, waiting for my duct-taped kludge of a program to pipe one of thousands of records into a nano session every few minutes. This was on an Apple G4 workstation running OS X, where I earned my real bash scripting chops. It was an awful hack by today's standards, but at the time, accomplishing what was expected to be a 1-year long project in ~1 month, it was seen as nearly miraculous.


I know lots of people like awk, but I pretend it doesn't exist. Why? Here's my comment on this from 6 years ago[0],

>I used awk until I learned Python (long ago). For me, awk was yet another example of the "worse is better" approach to things so common in unix. For example, if you make a syntax error, you might get a message like "glob: exec error," rather than an informative message. "Worse is better" is probably a good strategy in business and for getting things done, but still, mediocrity and the sense of entitlement that so often goes with carelessness, sickens me.

[0] https://news.ycombinator.com/item?id=13457265

Long live the Unix Hater's Handbook! (Unix is fine, and so are the criticisms herein. Some of these criticisms have been eclipsed by ongoing development.) https://en.wikipedia.org/wiki/The_UNIX-HATERS_Handbook


You are missing out. As a former data engineer/current SRE, I spend my entire day with VSCode/Python/Notebooks/CoPilot banging out python code - but whenever I need to do a complex analysis of a semistructured text file in < 60 seconds, awk is my twitch reflex tool. It can trivially do state transition based on patterns in the file, as well as populate hashes from one file and use them in analysis of the next file in just a few characters.

Awk's claim to fame in my world is that it's cognitive activation energy for anyone who has taken the 3-4 hours to learn the language from start to finish (and that's the awesome thing about the language - it really is about 3 hours of concentrated attention) - is essentially nil. You see a bunch of ugly not really structured text 500 MB files that you can't pull into pandas, or easily parse into python dicts? No problem - awk will tear through them for you and get the information you want in < 60 seconds, including the time you took to write your (almostl always single line) of code.

That's Awk's sweet spot.


Point taken. I have a Python program that is an elemental version of awk, and I use that for the odd task. I can modify it if needed and I have the entire Python library to help me. Is the text Unicode? HTML? These little details matter.

I'm not complaining that someone banged out awk (speaking figuratively) on a Friday afternoon to do something and not have to stay after work. Excellent! My complaint is that the failure to address technical debt has negatively affected the productivity of millions, if not tens of millions, of people, often working under pressure, for DECADES.


I'm not sure what technical debt you are referring to. Awk is designed to do one very simple job, and it does so using a language that I can usually teach to new SREs in < 2 Hours with 9-10 follow up tasks that drill in their understanding.

It's benefited from extraordinarily enlightened stewardship, kept it's minimalism and strengths, and will finally get a key enhancement (UTF-8 support).

The first edition manual is probably the greatest example I've ever seen of technical writing as well.


In general Perl fits that niche for me better, but sometimes awk is what you have.


I will bet you $1000 that time spent learning Awk will lead to better results much faster than time spent polluting your privileged user directories with Python's excuse for "dependency management"


I agree entirely!

For many python users, it’s the only language they know. Often, they see programming in python, as part of their “identity” - so they’re overly invested in it, to the detriment of other wonderful languages, like awk.

I used to code perl myself, back in the day - but I came to appreciate the simplicity of awk, and now it’s one of my favourites. I no longer code perl, as a consequence, as I believe awk to be far more elegant! I wouldn’t have done so, if I was overly invested in being a “perl programmer”.


Specifically, Awk is a good solution to a problem that should never have existed in the first place. Why am I having to write these bespoke parsers for the random mess of output formats that you get from the UNIX command line?

Well, the fact is that I have to write such parsers. That's very sad, but has no chance of being fixed. So it's good to know Awk.

I think Erik Naggum had this exact criticism of Perl.


Seems like the best time to ask since this is an awk thread: if anyone has a line on the original artwork or a source for the awk t-shirt please let me know. From memory it's of a gangly bird jumping / parachuting from an airplane (DC3?) and captioned with awk's infamous catch-all error message: "Awk: bailing out near line one".


Currently looking @ alternatives (not that I dislike AWK, far from it):

Tokay: https://github.com/tokay-lang/tokay

frawk: https://github.com/ezrosent/frawk


Have you considered tab?

https://tkatchev.bitbucket.io/tab/


What do you think of them? Tokay in particular looks very polished.


TBH: no conclusion yet (did not find time ATM to try it out in fill detail)...sorry


Take a look at marcel: https://github.com/geophile/marcel


Have to wait, as "The book will be available by the end of September"

See https://hn.algolia.com/?q=The+AWK+Programming+Language for discussion on the first edition

Didn't know there was a list of `awk` implementations: https://www.gnu.org/software/gawk/manual/html_node/Other-Ver...


One of the first utilities I had to get to grips with way back was awk, and it serves me well to this day. Best bang for buck investment of time in my entire career. Even today I still use some variant of awk -F(x) '{print $x}'.


This is good news, because you have to pay a lot for a used copy of the first edition nowadays. I hope the spirit remains the same as in the first edition.


I read the first edition so many times as a young kid... AWK was just such a cool name when I would go to the library and grab a book out of the stacks trying to learn something new.


I don't know about Awk, but I feel the urge to write a library named "ward" for it.


Maybe the person who deals with security issues for an awk implementation could be called the awk ward.


Also watch his recent interview on Computerphile: https://www.youtube.com/watch?v=GNyQxXw_oMQ

And: Brian Kernighan adds Unicode support to Awk https://news.ycombinator.com/item?id=32534173


With Lex Friedman from ~2 years ago:

* https://www.youtube.com/watch?v=O9upVbGSBFo


Honestly after watching a lot of Kernighan interviews and reading his original book on C he is a very great communicator. I wonder how different the software world would have been without him at Bell Labs. Would Unix and C have become as widely used as quickly?


Kernighan is a national treasure and a world treasure!


I wish awk had support for addressing a range of fields, like from $1 to $7. `cut` supports it, FWIW.


You can always loop through the fields, but it’s a little messy, especially for one-liners


Yes, that's an option. The range lookup is an ergonomic feature. Imagne what it would have been like, if we couldn't do foo[-3:] in Python.


Awk is old but great, designed to chew through lines of text files with ease, and has great defaults that minimize the amount of awk code you actually have to write to do anything. It's underrated.


I love using Awk, the only thing I miss is that it can't handle complex csv files. Does anyone know how to handle quoted CSV strings like

> "foo","bar,baz"


I like the idea of Unix pipelines, but I hate all the sublanguages, awk being one of the biggest. I scratched my itch and built my own shell, marcel: https://github.com/geophile/marcel.

I mention this specifically, here, because of the CSV point. Marcel handles CSV, e.g. "read --csv foobar.csv" reads the foobar.csv file, parses the input (getting quotes and commas correct), and yields a stream of Python tuples, splitting each line of the CSV into the elements of the output tuples.

Marcel also supports JSON input, translating JSON structures into Python equivalents. (The "What's New" section of marcel's README has more information on JSON support, which was just added.)


If quoted string is the only thing you need to handle extra (i.e. no escaped quotes, newlines, etc) and if you have GNU awk:

    $ echo '"foo","bar,baz"' | awk -v FPAT='"[^"]*"|[^,]*' '{print $1}'
    "foo"
    $ echo '"foo","bar,baz"' | awk -v FPAT='"[^"]*"|[^,]*' '{print $2}'
    "bar,baz"
For a more robust solution, see https://stackoverflow.com/q/45420535 or use other tools like https://github.com/BurntSushi/xsv


I wanted to ask why not the more simple form:

echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $1}' "foo

echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $2}' bar,baz

echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $3}' boo"

Realizing that I have to strip the quotes that remain.

Edit. formatting.

EDit, again, from your link, the following is more terse and too my taste (still needs strips):

awk -v FPAT='("[^"]*")+'


I usually use this awk function to parse CSV in awk:

    # This function takes a line i.e. $0, and treats it as a line of CSV, breakin
    # it into individual fields, and storing them in the passed in field array. It
    # returns the number of fields found, 0 if none found. It takes account of CSV
    # quoting, and also commas within CSV quoted fields, but doesn't remove them
    # from the parsed field.
    # use in code like:
    #   number_of_fields = parse_csv_line($0, csv_fields)
    #   csv_fields[2]  # get second parsed field in $0
    function parse_csv_line(line, field,   _field_count) {
      _field_count = 0
      # Treat each line as a CSV line and break it up into individual fields
      while (match(line, /(\"([^\"]|\"\")+\")|([^,\"\n]+)/)) {
        field[++_field_count] = substr(line, RSTART, RLENGTH)
        line = substr(line, RSTART+RLENGTH+1, length(line))
      }
      return _field_count
    }
It's not perfect but gets the job done most of the time and works across all awk implementations.


Convert it with Miller first:

    mlr --icsv --otsv cat examplefile
* https://miller.readthedocs.io/en/latest/10min/


Yes, this is what csvquote does. It does nothing else, just this so that programs like awk, sed, cut, etc. can work properly.

https://github.com/dbro/csvquote


They are planning built-in support for that, see that other comment https://news.ycombinator.com/item?id=36518146


I FINALLY started learning awk in the past couple weeks. I think I was intimidated because awk can be very terse, and there are some default actions that aren't clear when you first start looking at awk scripts.

My other problem is that I want to accomplish things, not learn a tool, and it generally takes me a bit longer than it should to decide to actually learn something and not just hack at it.

Is it still worth it to be "the awk guy" at work?


yes, because you'll be done with your thing before others figure out how to lay out your spreadsheet. also your solution will be reusable.

(based on my experience where people who could've benefited from awk for a one-liner dependably reach for sheets/excel rather than something like python or perl)


Who wrote the second edition?


I read a comment on HN the other day by someone reviewing the book and I believe they said it was Brian Kernigan.


It was mentioned recently here in another HN thread that Brian Kernighan is writing it.


The lowercase 'bwk' used in the text makes me believe that ...


I wish I use awk all the time but everytime I use it the knowledge I gain doesn't stick. Could be due to its arcane syntax which is just too hard for me to remember.


Wow, hyped for this.

I picked up this little book from my University library once, and it was a fantastic read.


Awk was great at its time, but when you need at write more than 5 lines of awk code please consider using python since

1. It is a lot faster than awk/perl/grep/sed combos

2. Way a lot readable and maintainable

3. More powerful than awk with it's string functionalities

4. Same availability as awk in OSs since last decade


Find and AWK together, a match made in heaven. Thanks for the link.


do you have some resources regarding the use of awk with find ?


I love the csv-mode. It obviously takes some time


I am looking forward to this coming out.


Can I preorder this?


Awk and ChatGPT are best friends.


Yeah this solves the "I don't use it enough to remember it problem". ChatGPT eliminates the first hurdle of using it, so I'm likely to use it more, and then hopefully it will start to stick.


How so?


Ask ChatGPT to write your awk scripts - it does a prettyy damn good job at a first pass.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: