The Awk Programming Language (1988) [pdf]

coliveira · on May 24, 2018

I consider awk to be the most useful and underused language in the UNIX ecosystem. I use it daily to analyze, transform, and assemble data, and it always blows my mind that so few people really know how to use it at a decent level. This is an excellent book to give a real idea of what awk is capable of.

ajross · on May 24, 2018

> it always blows my mind that so few people really know how to use it at a decent level

Not nearly as surprising as it is to me that now that most developers have forgotten perl, they're turning to awk as an inspiring example of a bygone era.

Seriously: perl basically replaced awk in the mid-90's. It absorbed all the great lessons and added two or three dozen innovations. But it had scary syntax, so everyone used python (which did not replace awk very effectively) and forgot perl. So now we're back at awk. And the scary syntax is all in the Rust world.

TurboHaskal · on May 24, 2018

Perl deprecates awk & sed [0]. Whereas Python is a bad substitute for those, and because of that, we're back on Awk. I have the feeling that Python simply succeeded as it seems to be a language designed for people that do not enjoy programming, which sadly seem to be the majority in our industry.

The Perl hate is so prevalent in most companies, that you can get into serious issues should you even write a one-liner in it. So here I am, reluctantly using awk/sed and asking too much of Bash/Python.

[0] https://www.manning.com/books/minimal-perl

mauvehaus · on May 24, 2018

Ooof. I have to respectfully disagree that perl deprecates awk. If I'm writing awk, it's usually because what I want to do is precisely what awk does by default: read a line, match it against a condition, and perform some action.

I could do all that in perl, but it's more stuff I'd need to dig up every time I wanted to do something awk-like. People who code perl regularly already have it, for sure. It's a barrier for those of us who don't, and one that awk doesn't have.

Awk isn't perfect, for sure, but I've transitioned from writing simple text processing in perl to doing it in awk because awk provides the basic framework that I'd end up doing ad-hoc and non-idiomatically in perl every time.

Again, all this with that caveat that I'm not a perl programmer. I used it for a couple of semesters in college, which was enough to be dangerous, but not truly proficient.

freeone3000 · on May 24, 2018

`perl -n` wraps code in an awk-alike start and finish stanza.

`perl -p` wraps code in a sed-alike start and finish stanza.

Koshkin · on May 25, 2018

So, basically, if I understand, Perl has both awk and sed built in.

vram22 · on May 25, 2018

Yes, as per what I read long ago, Larry Wall created Perl as a sort of superset of parts of awk, sed, shell and C (maybe not shell).

harry8 · on May 29, 2018

absolutely shell. Perl acts as glue between multiple processes very nicely. $foo = `foo_command`; is great. Opening named pipes. Python is more awkward because it's less like shell.

vram22 · on May 29, 2018

Thanks, didn't know that about Perl.

Annatar · on May 24, 2018

If one writes a Perl one-liner where AWK will do, that’s avarice.

I used to maintain and debug a relatively large and an extremely complex build engine in Perl for a living for several years, and there was no construct in there that could not have been easily written in AWK.

blowski · on May 24, 2018

> a language designed for people that do not enjoy programming, which sadly seem to be the majority in our industry

Could you expand a bit on that comment?

mjburgess · on May 24, 2018

If I can add my interpretation of his comment:

Consider which language is sold to people-off-the-street as "How to Program!" it's python. Most of these people do not enjoy the cognitive effort, detailed typing and symbolic work, algorithmic thinking, architectural thinking, linguistic thinking that is "programming". At best, the enjoy the end result, but not the process.

Python taught in this way at least, fools people into thinking that programming is about something far simpler and more toylike than it is. Many "hard" disciplines do this: sell children and lay people on toy experiments that momentarily captive but have no relationship to what a practitioner of that discipline does.

The older system of recruitment would be to polarized hard, ie., to throw people in at the deep-end to scare off all the people who would waste time/resources training. Who remained really wanted to do that thing (eg. physics, programming, ...).

Today we're doing something perhaps vaguely immoral: selling people on a career that has no relationship to the sales pitch.

blowski · on May 24, 2018

To me, that sounds like gatekeeping - as in "it's only programming when it's insanely difficult".

Somebody playing football in a Sunday pub league is still playing football, and still loves playing football, even if they're not at the same standard of Lionel Messi.

For me at least, I feel the same about programming. I love understanding somebody's problem, and building a solution for them that solves that problem. I normally use PHP (and sometimes Python or Ruby or JavaScript) because they make it easier for me to focus on the problem, rather than language details. I can't always solve the problem because it's too difficult, and perhaps some of my solutions are not 'optimal'. But I feel hurt by the idea that because I don't have a strong understanding of how Python works at a really deep level, I'm not a real programmer.

I also think that's a great way to piss people off who are just getting into the industry, and may one day become great programmers - even Linus Torvalds was a junior once. I'd encourage them to keep going, keep learning, and keep helping people solve their problems (and getting paid good money for doing that).

mjburgess · on May 24, 2018

Its only gate-keeping if people want to be past the gate. I'm not talking about deterring people who are interested.

The vast majority of people do not want to be programmers and would not enjoy programming. Delaying the moment the actually have to do something difficult with a programming language is not especially heathly.

How would you feel about a career in football sold to you on the basis of table pong? Keep playing the pong, and then one day, you're face is in the dirt and you drop out.

The self-esteem hack psychology of the 60s-90s equivocated encouraging people with lying to them, as-if the only way we can get programmers is by lying about what programming is about. This isnt encouraging anyone, it's lying to them.

exelius · on May 24, 2018

I think you’re confusing “programmer” with “10x programmer”. Plenty of people are capable of implementing business logic in code. Very few are capable of designing that logic — they’re the 10x programmers who know all about data structures, algorithms, etc.

You can’t run a business expecting every employee to be a rockstar. It just doesn’t scale. So you skill it down and put the high-skill people where they can have the most impact.

blowski · on May 24, 2018

It sounds like we have a different definition of programming. For me, programming is producing code that gets executed by another piece of hardware or software.

The complexity of the code is not part of the definition. Nor is your understanding of the hardware/software involved.

mjburgess · on May 24, 2018

Enjoying programming is enjoying programming. The way it is sold, sometimes, is not as programming.

Unless you've experienced this sales pitch you're not going to connect with the point i'm making.

kchr · on May 25, 2018

Care to give an example pitch from the real world?

singingfish · on May 25, 2018

Perl isn’t difficult in the same way that the English lnguage isn’t difficult. Easy to get started with, takes a long time to master. Fortunately for us die hard perl types it’s quite capable of cleanly solving all normal dynamic language problems, and some seriously abnormal ones. And because of the insanely good backcompat in perl we’ll see the pendulum swing back to perl 5 some time in the next decade.

kamaal · on May 24, 2018

>>To me, that sounds like gatekeeping - as in "it's only programming when it's insanely difficult".

The whole point of programming is power and flexibility. Else there would be no need to move beyond logic gates.

The problem is tools sold for newbies tend to take away too much power to make it easy for people to start, but then keep them, right there, all life.

ams6110 · on May 24, 2018

When I did my comp sci undergrad degree, languages per-se was not part of what was taught in the classroom. Whatever language that class was using, be it assembler, scheme, C or some other higher level language, it was up to the student to figure out the syntax, how to run the compiler, etc. You got some help and basic examples in the discussion sessions but it was not part of the lectures. And there was no web, no stack overflow, no google at the time. You had a book about the language, some local newsgroups (and the larger USENET) and that was it.

If you didn't enjoy reading, puzzling, and banging your head against the wall you would wash out after a couple of classes.

code_duck · on May 24, 2018

What language do you think beginners should be learning that would not keep them from what you say is ‘cognitive effort, detailed typing and symbolic work, algorithmic thinking, architectural thinking, linguistic thinking’. Perl? And why is Python incompatible with those things?

Like many of us, I started programming with BASIC, not C or assembly, but I turned out fine. I don’t think people need to or necessarily can absorb all the high level details straight off. To me python seems well organized enough that it makes a great language for beginners.

mjburgess · on May 24, 2018

Python is fine.

Python is perhaps my favorite language, and I use a dozen fairly regularly.

My point was about how python sometimes gets used, and how its design is especially facilitating to that use.

code_duck · on May 24, 2018

As far as the second part of your comment, I think if you make topics like this impenetrably difficult and complex at first, it will not only filter out the non-dedicated, but also those who don’t yet know that they would like to be dedicated to it.

singingfish · on May 25, 2018

Aka python helps you think more like the computer does. Perl helps the computer think more like you.

tzahola · on May 24, 2018

Python is the rubber sheet analogy of programming.

kqr · on May 24, 2018

Note that it sounds like a snide remark but it doesn't have to be. One could interpret it as, "Do you want to simply get things done, and not bicker endlessly about microoptimizations or where the braces go? Python is for people who simply want to complete their tasks and then go home and spend their time doing more stuff they think is fun."

It's not necessarily a bad thing to be productive, instead of inventing new problems for yourself to keep having programming things to do.

(Note that none of this necessarily reflects my personal opinion; it is just an alternative way to read the grandparent.)

snarfy · on May 24, 2018

The cargo cult continues.

It's disgusting how much we all rely on intuition and gut feelings when evaluating large swathes of technology. The internet hates perl so people criticize it without even using it. People will use what everybody else is using without actually trying out the options. There's too much information and so we must go with what others have said, and it all becomes hearsay. Keeping up is more like wizardry than engineering. Go with what the crowd says because I can't possibly install all of those libraries, play with the examples, and give my own evaluation. Hey look a new js framework just came out...

adamc · on May 24, 2018

I used it quite a bit, and wrote some applications in it that are still in use a decade later. The "write-only" aspect of Perl is real -- it enables many styles and idioms, and as a result, tends to requires reading knowledge of all of them. It also has some unusual design decisions (list vs. scalar context) and some hacks ("bless") that do not contribute to readability, especially for people who do not get to use Perl all day long. Python is a lot more readable, and does most of the same stuff.

What Perl did that was amazing was bring regular expressions "to the masses", and Perl-compatible regular expressions (pcre) are still the defacto standard that most subsequent libraries have used (more or less).

"The internet" is an abstraction and doesn't hate (or love) anything. That itself is the kind of gross generalization you are criticizing. And one can criticize a language and still have respect for it.

busterarm · on May 24, 2018

Let's be honest -- people only use what everyone else is using because it lowers the bar to getting hired to 90+% of programming jobs.

I can count the number of tool-agnostic development teams that I've met on one hand. Many more have claimed they are when they are not.

If you aren't aiming for the top 10% of jobs (vague quality metric that you can interpret as you wish), then you want to have above-average knowledge of _just_ Python, Go, React, Docker and Kubernetes.

The situation only changes when high profile current/ex-Googlers (or similar) start talking about a language/tool a lot. Then the mass hops on board that train too.

adamc · on May 24, 2018

I agree with the thrust of your comment but not the specifics. There are still a ton of Java and JavaScript (not necessarily React) jobs out there in the "bottom 90%".

And I don't think "high-profile" people talking has much effect. Paul Graham talked up lisp for a while, and I was certainly interested (I like lisps), but... there are still precious few jobs that use Lisp. Most people learn technologies that they either need currently or are in use in jobs they know of and might conceivably get.

busterarm · on May 24, 2018

Even at shops just hiring for Java or JavaScript, I don't think knowing them is a career advantage. They don't make you more hirable than you would be otherwise. That's really the only reason I left those out.

Paul Graham, while a thought-leader of sorts, isn't generally thought of as someone working at either the edge of tech or in large-scale systems. That's why nobody wants to chase the tech he's using vs what Google/Facebook/etc are.

carlmr · on May 24, 2018

>And the scary syntax is all in the Rust world.

There's something like necessary complexity you can't easily abstract away. I find Rust does a fine job at cleaning up syntax. I'm not a fan of snake_case. Other than that I can't think of anything that's more difficult than the underlying concept in Rust. And it's still close to C (braces and functions) and ML syntax (type after colon and variable name, let bindings) in many ways.

Especially compared with a similarly complex language like C++. Now that is scary syntax, if you're not used to it from 20 years of using C++ and developing Stockholm Syndrome.

ajross · on May 24, 2018

Oh, don't be silly. Exactly the same point you're making about rust can be made about both C++ and perl by expert practitioners. Syntactic complexity is linear with expressive power, that's why it's complex to begin with. You just "like" rust, so you view it as a good tradeoff there and not in other languages that you "dislike".

My point above was that this decision is basically one of fashion and not technology. And the proof is how the general consensus about awk has evolved over 20 years as perl has declined.

derefr · on May 27, 2018

Are you confusing syntax with grammar? Rust has a large grammar—many reserved words, many compile-time macros, etc.—but not too much in the way of syntax (e.g. novel punctuational operators; novel kinds of literals; etc.)

C++ and Perl, meanwhile, both have tons and tons of syntax, such that they're 1. harder to grasp for people who haven't seen them before, and 2. harder to learn (especially by attempting to Google language features "by name.")

If there was a spectrum with Lisp [or Forth] on one end and APL on the other, Rust might be somewhere right-of-center... but it'd still be pretty far left of C++ and Perl.

Also, given the languages that occupy the ends of said spectrum, I think it should be clear that your position on said spectrum has no correspondence with "expressive power" :)

carlmr · on May 28, 2018

You said it better than I could :)

kazinator · on May 27, 2018

Awk has evolved? GNU Awk has---somewhat. It has a clumsy facility in the place of a proper FFI called the "extension API" for binding to C libraries. You need to compile C code to use it, and the API has been a moving target. It has a way to work with XML. It has an "in place" editing mode, and other things like bignum integer support via GMP (requiring the -M option to be used). Plus various minor extensions to POSIX, like full regex support in RS (record separator), a way to tokenize fields positively rather than delimit by matching their separators, a way to extract fixed-width fields, an @include mechanism, a way to indirect on functions and such. None of it adds up to very much.

carlmr · on May 25, 2018

I like C++, but it's not the language that you would design if you hadn't accumulated so much cruft over the years. Rust didn't have to support C compatibility and compatibility with earlier C++ standards. So Rust could be designed properly for modern use cases.

There's necessary complexity to express certain concepts, but C++ has accumulated a lot of unnecessary complexity over the years.

Rust, in terms of GC-less languages with mostly zero-cost abstractions is the simplest language that I've seen. And having a GCless language with memory safety is not just fashion. It's pretty much the greatest single advancement in language design since the GC itself.

quiq · on May 25, 2018

> Syntactic complexity is linear with expressive power

Have you taken a serious look at a lisp? You might be pleasently surprised. Everything that can be said about lisp has probably already been said, but I'd argue that sexprs have a much higher expressive/complexity ratio than say the C++ grammar.

Somewhat relative xkcd[1]

[1] https://www.explainxkcd.com/wiki/index.php/224:_Lisp

wenc · on May 24, 2018

Second this. Perl replaced both awk and sed. Data scientists marvel at awk/sed but Perl is somehow forgotten. Perl May not be as suited to writing complex programs, but for text processing tasks, it is much more elegant than awk or sed.

Perl -pie is a very powerful idiom.

I would rather use a subset of Perl than awk.

tannhaeuser · on May 24, 2018

That's the first time I'm hearing Perl being praised for its elegance of all things. Elegance is certainly in the eye of the beholder, but by default is understood in the context of programming languages as "containing only a minimal amount of syntax constructs". By that measure, Perl is spectacularly/absurdly bad with its "sigills" and "there's more than one way" idioms. In fact, I find Perl one of the ugliest languages of all time.

Edit: a-ok, probably elegance is meant in the same sense that C is elegant by cramming everything in a single statement using pre-/post-increment operators and assignments-as-expressions

kqr · on May 24, 2018

One has to recall that there were some forces that wanted Perl to become a standard shell rather than a programming language. A shell is usually more limited in features, but it is frequently very forgiving, provides many shortcuts, and there are often multiple ways of doing things.

However, I've never believed it to be possible to have a language both as a shell language and a proper programming language for large-scale projects. I believe the two usecases are fundamentally antithetical, but I'd be happy to be proven wrong.

oblio · on May 24, 2018

> However, I've never believed it to be possible to have a language both as a shell language and a proper programming language for large-scale projects. I believe the two usecases are fundamentally antithetical, but I'd be happy to be proven wrong.

I'd say Powershell proves you right. Powershell has a great design, it has optional typing and access to a cornucopia of libraries via .NET.

Even so, they had to make some compromises because of the shell parts (functions return output, for example) which makes is quite finicky as a "proper" programming language.

On the shell side, the very nice nomenclature which makes it very readable and discoverable makes is annoying sometimes to use as a shell. That and the somewhat unwieldy launch of non-Powershell commands.

Someone who attempts to bridge the two has a ton of work to do, both in the research and in the implementation department. I guess Oil Shell (https://www.oilshell.org/) is the most realistic approach we have today. And it's probably still 1-2 years away from release and many more years from mass adoption (if that ever happens).

asicsp · on May 24, 2018

yeah.. I like the `s///` syntax in sed/perl than the `sub/gsub` syntax.. plus the regex is lacking in awk.. no backreference(gawk provides, but only in replacement section).. and perl has non-greedy, lookarounds, code in replacement section, etc

other nice features I'd like in awk is `tr` and `join`

ams6110 · on May 24, 2018

Why add 'tr' and 'join' to awk when they exist on their own?

That's part of why people avoid perl. It's very capable, but that wide scope is counter to the unix philosophy that prefers simple, focused utilities that can be combined in pipelines.

asicsp · on May 24, 2018

looks like I didn't word it properly...

you could ask why have sub/gsub when there is sed... that's because you need that for specific field or string in addition to other processing.. similarly, having tr for specific string/field is useful..

I meant join as in perl's join - to construct a string out of array values with specified separator

Some examples:

* https://stackoverflow.com/questions/48920626/sort-rows-in-cs...

* https://stackoverflow.com/questions/45571828/execute-bash-co...

* https://stackoverflow.com/questions/48925359/sorting-groups-...

nameiscubanpete · on May 27, 2018

I think perls niche is sort of it's downfall. For example, I used to work at a company with 70% C, 25% shell script, and 5% perl. Any time I ran into a perl script I had to switch my brain into Perl mode, with the understanding that what I was working on would be just as good or better in C or shell. I had nothing against Perl as a language, but always enjoyed exorcising a perl script from the codebase.

jclulow · on May 24, 2018

Ah, but which subset?

kqr · on May 24, 2018

What I like about AWK is that it is described by a few pages of the POSIX standard. If Perl was that ubiquitous and simple, I would prefer it over AWK.

ajross · on May 24, 2018

To be clear: I like that about it too. But in the real world, we want tooling that does more stuff. In the mid-90's, "everyone" knew perl, because it was the best choice for problems in this space. And in a world where everyone knows perl, there is no place for awk.

But now we live in a world where no one knows perl, and languages like python and javascript are clumsy and weird in this space. And that makes awk look clever and elegant.

All I'm saying is that in perl-world (which was a real place!), awk wasn't clever and elegant, it was stale and primitive. And that to me is more surprising than awk's cleverness.

fiddlerwoaroof · on May 24, 2018

I think people underestimate the importance of medium-powered tools. Perl is great, but awks limitations make it easier to write in a way that the next person can maintain.

fiddlerwoaroof · on May 24, 2018

That being said, the couple times I’ve paired with a perl wizard were revelatory.

zmodem · on May 24, 2018

Or just reading about perl wizardry: https://www.hanshq.net/perl-oneliners.html

xyrouter · on May 24, 2018

Can you elaborate this? What revelations did you experience?

ams6110 · on May 24, 2018

> in a world where everyone knows perl, there is no place for awk.

And yet in 2018, I never think about Perl anymore, but use sed, awk, and grep daily.

mbubb · on May 25, 2018

This is a great point

kamaal · on May 24, 2018

>>And that makes awk look clever and elegant.

Eventually the cycle will repeat again. The moment you will have non trivial text work to do, awk will have to give way to Perl.

The rise of Python was really use case for dealing with standard interfaces like DBs/XMLs/JSONs becoming common. Python hasn't actually replaced Perl in any meaningful way.

oblio · on May 24, 2018

Most modern distributions write their tools in Python instead of Perl. Proof: the really long transition process Fedora (and therefore RHEL), Debian and Ubuntu went through to migrate from Python 2 to 3. They could have done it faster if not for their system tools written in Python 2.

In the web space, Python is not huge, but it definitely supplanted the niche Perl used to have. No one I know writes web stuff in Perl anymore.

kqr · on May 24, 2018

I totally see your point. And in Perl-world, I would probably use Perl too – I mean, if both tools are equally ubiquitous, why not use the most powerful one?

I entered the field at the tail end of the Perl era, so I've only toyed with it a long time ago.

mozumder · on May 24, 2018

I hated perl in the mid-90's, and stuck with grep/awk/sed intead, because it was a little more documented-structured.

Meanwhile, if you randomly mashed on your keyboard, it would output a perl script.

I jumped to Python as soon as I found out about it in the late 90's, because it was exactly what I was looking for in self-documenting structuring. It was great for creating parsers with state-machines. I was also the user of my scripts, and I didn't want to have to relearn what I coded a year or so earlier. Python let me pick that up, and to a lesser extent, awk. State machine programming was really self-documenting in Python.

singingfish · on May 25, 2018

I’m sick of the “perl is line noise” narrative. https://pastebin.com/C9hPS8xR

dannas · on May 24, 2018

Here's a picture of the AWK book next to the Camel book https://twitter.com/heinrichhartman/status/77170643345276928...

"#awk doc shows you how to implement a rel-DB and a compiler. #perl doc talks 20p+ about nested data structures."

FractalLP · on May 24, 2018

That is one thing I like better in Python than in Perl. In Perl, I was having difficulty with nested data structures and then realized this was something I did in Python daily without even knowing it was a thing on a conscious level. Who hasn't made some weird dictionary with its values being lists or tuples or something like that?

ajross · on May 24, 2018

This is exactly backwards to my eyes. Perl's autovivification behavior (assigning through a empty reference to something autocreates the object of the right type) makes nested data much, much cleaner.

Who hasn't made some weird dictionary with its values being lists and forgotten to check for and write code to create it when it doesn't exist, and had to fix that later after a runtime crash? In perl that's literally covered by the one-line assignment you used to set the field you parsed.

This is why it's sad that everyone's forgotten perl.

kqr · on May 24, 2018

Defaultdicts to the rescue! But I think it should be an explicit choice. If you only intend to have lists at some keys, and then accidentally mistype a key, it shouldn't (in my opinion) silently create a list, effectively hiding the bug.

FractalLP · on May 24, 2018

Sorry, but I didn't catch your example. Can you explain a little more? What does Perl do better there?

kamaal · on May 24, 2018

Most of awk's 'advanced and hard to do' use cases in Perl go like

    open (my $FILEHANDLE, '<', $file) or 
        die "Cannot open $file\n";
    while(<FILEHANDLE>) {
        chomp;
        #Do you stuff here
    }
    close(FILEHANDLE);

Also perl can do various other things awk just can't. For example removing something from a file and then talking to a database or a web service, or do other stuff like parse a JSON or XML. Deal with multiple files, or other advanced use cases. Unicode work, advanced regexes etc etc.

In fact the whole point of Perl was Larry wall reaching the upper limits of what one could do with awk, sed and other utilities and having to use them all over the place in C. Then realizing there was a use case for a whole new language.

greenleafjacob · on May 24, 2018

In fact the boilerplate you describe is mostly redundant if you use perl -p since perl -p results in your program running with this wrapper:

    LINE:
    while (<>) {
	...		# your program goes here
    } continue {
	print or die "-p destination: $!\n";
    }

flukus · on May 25, 2018

Correct me if I'm wrong, but this will only be line based won't it? As far as I can tell there is no equivalent of FS/RS/OFS/ORS that make awk the record (not line) based language it is.

Annatar · on May 24, 2018

For example removing something from a file and then talking to a database or a web service,

...

Deal with multiple files, or other advanced use cases.

I do exactly this every day with AWK. Solving exactly these kinds of problems features prominently in the AWK programming language book.

kamaal · on May 24, 2018

It goes further. Awk is when you entire use case fits into the category of iterating lines in a file.

Perl is that plus more things.

Annatar · on May 24, 2018

Perl simply isn’t worth it. In all the decades of programming, I’ve yet to run into a problem which could only be solved in Perl because it couldn’t be done in AWK.

And I would hereby like to remind you that every computing problem is fundamentally an input-output problem, and because of this intrinsic property, it is possible to reduce all problems in computing to input-processing-output.

Which is exactly the kind of problem AWK is designed to address.

And AWK doesn’t work with lines, it works on records, for which the fathers of the language cleverly chose the default of ‘\n’, which is reconfigurable.

textmode · on May 24, 2018

"In all the decades of programming, I've yet to run into a problem which could only be solved in Perl because it couldn't be done in AWK."

Have you ever pondered the number of projects that, in addition to sh, make and common base utilities, require perl during compilation where awk could have sufficed?

As a single example, have you looked at compiling openssl without perl, using awk instead?

Whenever I see a perl prerequisite I question whether it is truly a requirement or whether other base utilities1 such as awk could replace it.

Assuming it could be removed, how much effort is one willing to expend in order to extinguish a perl dependency?

1. Some OS projects like OpenBSD make perl a base utility.

Annatar · on May 24, 2018

The illumos project undertook a massive effort to eradicate any and all dependencies on Perl (which had also been made part of the core operating system decades prior, by themselves no less). While they're still at it, they have managed to rip out most of it and replace it with binary executables in C or shell.

Yes, writing a build engine in AWK would be perfectly doable, but the right tool for that job is Make.

kamaal · on May 28, 2018

That's easy to do.

But that's not the end of it. The whole point of Perl is avoid a salad of C and shell utils. Also in many cases the moment you have to deal with >2 files at a time shell utilities begin to show their limits.

The resulting code is often far more unreadable than anything you will ever write in Perl.

Annatar · on May 29, 2018

I’m sorry you believe that, but it’s simply not true, especially since UNIX is designed to be driven by a combination of binary code and shell executables. Perl is a mess whose syntax borders on hieroglyphic once poured into software. That doesn’t mean that it’s not a capable language; but it’s not a language in which written programs are maintainable or easily debuggable.

tannhaeuser · on May 24, 2018

That, and awk programs are almost perfectly portable across awk implementations: nawk, gawk, mawk (can't vouch for busybox-awk as I haven't tested it, though I've heard it's good as well).

This is of practical importance for portable scripts, since Debian uses mawk [1], RH ships gawk, and Mac OS uses nawk.

[1] mawk has recently received new commits by its original author after a hiatus of 20 years or so; see https://github.com/mikebrennan000

jhbadger · on May 24, 2018

AWK seems far more modern than Perl, though. Defining a function with you know, "function" and actually having parameters with names feels like a 21st century language. Calling things "sub" and having it work off a an array called @_ doesn't. Yes, I know there are packages to add named parameters to Perl, plus Perl6 has them out of the box, but it is weird that things went backwards in a lot of ways when Perl replaced AWK in the mid 1990s.

stevekemp · on May 24, 2018

(g)awk was also fun to fuzz, recently:

https://blog.steve.fi/if_line_noise_is_a_program__all_fuzzer...

Though sadly still unfixed:

https://bugs.debian.org/816277

kqr · on May 24, 2018

You have a really good point. AWK feels incredibly modern for being so old.

kazinator · on May 24, 2018

Until you find that it has no local variables other than function parameters. In a function, local variables can be simulated by defining additional arguments ... and not passing those. Everything that isn't a function parameter is a global!

Awk is stupidly confused whether it is a two-namespace or one-namespace language (think Lisp-2 vs. Lisp-1). For instance, this works:

  function x()
  {
    print "x called"
  }

  function foo(x)
  {
    x() # function x, not parameter x.
  }

but in other situations the two spaces are conflated. For instance sin = 3 is a syntax error because sin is the built-in sinusoid function, even though you're using it as a variable, which shouldn't interfere with function use like sin(3.14).

coliveira · on May 24, 2018

Actually what you show is that there is a clear rule. First, the symbol is treated as a function. If it is not in the function space, then it is used as a variable name. To make a finer distinction awk would need some form of local variable declaration, which it clearly hasn't.

kazinator · on May 24, 2018

> First, the symbol is treated as a function. If it is not in the function space, then it is used as a variable name.

That is simply not the case. x() unambiguously treats x as a function, and will fail if there is no such function, even if there is a variable x.

  $ awk 'function foo(x)
         {
           x()
         }

         BEGIN { foo(42) }'
  awk: cmd. line:2: fatal: function `x' not defined

> To make a finer distinction awk would need some form of local variable declaration, which it clearly hasn't.

Also not the case. The purely syntactic context distinguishes whether or not the identifier is being used as a variable or function. Awk sometimes uses it, sometimes not. This doesn't diagnose:

  function x()
  {
  }

  function foo(x)
  {
     x()   # function call allowed
     x = 3 # assignment allowed
  }       
 
  BEGIN { foo(42) }

But:

  function foo(x)
  {
  }       
 
  BEGIN { foo = 3 }

  fatal: function `foo' called with space between name and `(',
  or used as a variable or an array

Why isn't it a problem that the function x() is used as a variable?

tannhaeuser · on May 24, 2018

Awk is basically what JavaScript syntax is based on (no really!). The following (while not terribly useful) is both awk and JavaScript:

    function f(x, otherArgs) {
      r = otherArgs["x"]
      for (v in otherArgs)
        delete otherArgs[v]
      return r
    }

JavaScript even uses awk-style regexp literals (though with PCRE semantics, and more features, etc.).

walshemj · on May 24, 2018

And Perl is 1000% better than awk/sed when my company got switched to unix based systems we all got sent on a weeks crash course this included sed and awk.

I have only used sed once to edit the passwd file on an early linix system when vi wasn't installed and awk never. Though having being trained in sed and awk helped with picking up Perl

olskool · on May 24, 2018

Python was never intended to replace Perl. Perl was designed to extract stuff from text files. Python was designed as a scripting language for system programming. IM(Seldom Humble)O Python beat Perl for two reasons (a) batteries included (CPAN became a clusterfuck) (b) C API lead to stuff like SciPy, NumPy and Pandas.

FWIW I've used both Perl and Python professionally and Python rules.

reacweb · on May 24, 2018

Perl is a little older than Python. Most of the ideology of Python was a reaction against Perl (https://wiki.python.org/moin/TOOWTDI). Python has always tried to be as good as Perl on everything. I use very much Perl for what it was intended: parsing files (often log files and configuration files), producing reports and launching commands (shell replacement). I have tried to use Python for the same tasks, in particular when some of the files were in xml (xml parsing in Python is nicer than in Perl). Regular expression usage is easier in Perl where multithreading is easier in Python. IMHO, the main handicap of Python vs Perl is the lack of autovivification. Python is very good for teaching (whiteboard interview) or as a scripting language around some big libraries (like tensorflow). At my work, the small glue scripts are almost always in shell or in Perl. The Python applications are being rewritten in Java because of maintenance issues (mainly caused by lack of proper typing). Python does not rule here.

busterarm · on May 24, 2018

You might want to try Ruby for some of those things you reached to Python for. It takes direct inspiration from Perl and does a lot of those things better, IMO.

Nokogiri is hands-down the best tool for dealing with XML that there is.

I have much the same experience with Python and anything significant developed in Python here ends up getting rewritten in Go.

Sileni · on May 24, 2018

Not to move the post, but that's part of what makes python so great to me. It reads like pseudocode if written with that goal in mind. Its more like having a conversation with the computer. When I need performance, I now have the algorithm written out in an easy to parse way. Then small parts in golang/C are far faster to write, because I can feel out the whole program.

Im not directly in the software industry though, just writing programs for data analysis in quality and safety programs. I'm sure once you can think directly in something like Go, it'd be faster to write that program first. But it decreases the cognitive load for me.

busterarm · on May 24, 2018

No worries -- I actually agree with you. We actually don't do very much scripting at my company in general -- Java, Go and JS (for Lambda) are roughly what we've standardized on.

We have a bunch of Python scripts for operations work and it's rare that performance becomes a major concern (...except in/regarding Ansible). Development isn't my team's primary responsibility so this state of things is fine -- our SysEng team can grok Python pretty well, whereas with other languages I wouldn't say this is true.

marklgr · on May 24, 2018

To me, the main reason of Python success vs Perl is that it's much easier to learn, so anyone can jump in even with limited programming experience.

mannykannot · on May 24, 2018

That is probably true with regard to its adoption, but, as a Perl fan and user, I have to say that in addition, the limitations of Perl 5's syntax begin to show once you get in to collections of collections, and passing them to functions. If you are doing this sort of thing every day, it becomes second nature, but as a casual, ad-hoc user, it is no longer in my working memory.

Maybe Perl 6 fixes these things, but learning it is too far down on my to-do list, where it sits just below Ruby.

If I have a problem that can be solved by looping through the lines of a text file and applying some combination of regular expression matching and substitution, the split and join functions, simple arrays and hash tables, I reach for Perl 5.

bionoid · on May 24, 2018

> Python beat Perl for two reasons [...]

The reason I personally left Perl was the extremely hostile community. When someone asked a simple question, not only would they get an overhaul and namecalling, but so would anyone that tried to help them.

Python seemed to do a much better job at onboarding new people. To me, that seems like the most important reason they won in the long run.

liveoneggs · on May 24, 2018

that is a very abnormal experience. Larry Wall has personally helped me with at least two junior issues over the years by just randomly answering in general forums.

FWIW perl6 takes extra special effort to be nice.

bionoid · on May 25, 2018

> that is a very abnormal experience.

I don't have links handy, but there were official blog posts and such about how they endeavored to fix the problem. Of course I'm not saying it was universal, nor trying to take away from the fact that there are a lot of great people in the Perl community (PerlMonks is a good example)

They did fix it, as far as I can tell anyway, but it was too little too late.

collyw · on May 24, 2018

>CPAN became a clusterfuck

Did it? Not used Perl for a while but I never needed to worry about backward compatibility and versions with CPAN, while I find I do with Python (to be fair I am programming more complex stuff than I did in my Perl days).

FractalLP · on May 24, 2018

I've used Python for years and a little Perl. CPAN works pretty well when used with ActiveState...not sure about Unix. Python on Windows stinks when using Pip and there is some VisualStudio C++ library dependency you need and they don't even have anymore. With Python it is best to get a large distribution that bundles everything already such as Anaconda...not worth the hassle.

ivanhoe · on May 24, 2018

IMO Python won primarily because its syntax was more English-like, which makes it a lot easier and more readable to regular people who never worked with shell and linux, and perl's $#@%>> and regExps confused the hell of them. That and the boom of PHP killed Perl 5 popularity, while python got its foot in the door in science/finance niches and survived.

Zash · on May 24, 2018

> CPAN became a clusterfuck

This gave me flashbacks to installing some perl module and watching it download half of CPAN.

olskool · on June 1, 2018

Just as bad I'd download CPAN module A then CPAN module B and CPAN would totally trash A.

vgy7ujm · on May 24, 2018

Still NPM is so much worse...

Zash · on May 26, 2018

Those who don't learn from history etc.

conner_bw · on May 24, 2018

> forgot perl

Or maybe, hiding out in PHP?

Steeling syntax, releasing versions bigger than 5, smoking regex benchmarks...

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

tannhaeuser · on May 24, 2018

> And the scary syntax is all in the Rust world.

JavaScript is heading in the direction of Haskell, Scala, and (to a lesser extent) Rust, too [1].

[1]: https://medium.freecodecamp.org/here-are-three-upcoming-chan...

MichaelMoser123 · on May 24, 2018

Perl5 and perl6 are very different languages - however after perl5 there was nobody left to listen... (Disclaimer: I got used to perl5 and I still think it's quite reasonable)

justwalt · on May 24, 2018

I didn’t know Perl was capable of filling the same niche as awk, and I think awk is absolutely wonderful. How does Perl improve on it?

vasili111 · on May 24, 2018

What about tcl? Is it better than perl for text proccessing

xyrouter · on May 24, 2018

If a 25 year old programmer were to start learning Perl now, what would you recommend: Perl 5 or Perl 6?

jdoege · on May 29, 2018

If you are doing it for personal growth and learning, then Perl 6. It is an interesting language that cleanly supports many major programming paradigms including functional, imperative and object oriented and, most probably, others. Additionally, it has many nifty features built in such as concurrency, grammars and rationals. Aside from that, it is also fun to use and has a great community.

If you are doing it for immediate application in a job or wish to acquire a skill you think might be required in a job then the answer is probably Perl 5. Despite the noise about Perl, it is still used a lot in many industries for flow control and inter-tool format conversion as well as many other applications. Many, many people understand perl and use it for rapid development of automation tools. I can tell you for a fact that it is pervasive in the semiconductor industry and is practically a job requirement, there.

FractalLP · on May 24, 2018

Perl6 is mostly complete. Even though it has been getting significant weekly performance improvements (they have a weekly summary of these things), it still has a long way to go. It has a lot of really cool things such as simple to use concurrency, grammars, gradual typing...etc. It is a big language, but not super hard to learn. A bunch of books have been released on it in the past two years. Perl5 is used by companies now. Some are choosing to write new software in it, but there is a lot of legacy apps.

Remember that they are two different languages with some similarities. Perl6 has really good support for making big apps I would say as it has really good OO support out of the box in addition to making FP concepts easy too. Perl5 has atrocious support baked in, but they have the excellent libraries that everyone uses to give them world class support for OO. Still, it feels more natural in Perl6.

On another note, Groovy is another great and fast language that runs on the JVM and works very well in the scripting space with a lot of DSLs for GUI, XML, DB...cool stuff.

vorg · on May 24, 2018

> On another note, Groovy is another great and fast language that runs on the JVM and works very well in the scripting space

But the same versioning issue exists for Apache Groovy (2 or 3 ?) that exists for Perl (5 or 6 ?). The Groovy PMC seem to be handling the issue, though, by slowing down the development of Groovy 3 to a crawl so it won't ever ship.

Oh, and calling Groovy "fast" is a bit of a stretch. It hardly matters anyway because scripting for classes written in Java doesn't need "fast".

cygx · on May 24, 2018

Depends on what you're looking for:

In many ways, p6 is the superior language. But as an AWK competitor specifically, p5 might be the better choice for performance reasons alone (while p6 has been improving, as far as regex performance goes, it just isn't there yet [1]). You might even be able to write tighter code in p5 (for one, p6 regexes return objects, not strings, so in cases where you actually want the latter, you'll have to throw in boilerplate).

[1] https://gist.github.com/cygx/9c94eefdf6300f726bc698655555d73...

v_lisivka · on May 24, 2018

Perl6 has grammars: https://docs.perl6.org/language/grammar_tutorial (recursive regex with sane syntax), which are much better than plain regex in many cases. Otherwise modern Perl5 is much more practical.

zimpenfish · on May 24, 2018

If you want to get paid for doing it, Perl 5. If you're just mucking about with personal projects, Perl 6.

nur0n · on May 24, 2018

Perl 5 and Perl 6 are two different beasts. Perl 5 is closer in spirit to Awk.

mdip · on May 24, 2018

    it always blows my mind that so few people really know how to use it at a decent level

I'll second this -- it's by far the most used swiss-army knife in my arsenal[0]. A little part of me dies when I see scripts doing grep|awk, using awk simply to print a column. After taking some time to actually read the gawk user manual a few years ago, I've found that I can do most things in awk that I was previously using grep/sed (and to a lesser extent) perl/python for in the shell. In my environment, it also comes with the benefit that I know the awk code I'm writing is supported by the version of gawk that's installed on every server/computer I come into contact with -- no having to ensure the right python/perl is available.

One thing I wish there was better documentation on, though, is creating complete gawk scripts. With @include directives coupled with AWK_PATH, it's really convenient to build up a handful of utility functions that do frequent things and I've found, on several occasions, that I end up writing an .awk script instead of a bash/zsh script and it ends up being a much more straight-forward set of code.

[0] Well, specifically, the 'gawk' variant

ausjke · on May 24, 2018

why not write something about the @include directives like a blog or some posts, that will be awesome.

busterarm · on May 24, 2018

I think my tendency to reach for grep/sed/awk might be the thing that led me away from software development full-time and into the ops world.

Too many times I encountered 200+ line Python scripts that I could replace with an Awk one-liner and a cron job.

My lazy habit in my Ruby work eventually became to parse JSON/XML/whatever into usable text, shelling out to sed/awk and working with the results. This would not only save me LoC, but was less error-prone.

taeric · on May 24, 2018

I think my hiccup is that it seems like good awk programmers are expected to be able to pass the full bloody program in as a simple string argument in a bash one-liner. Which is clearly not tenable.

There are a few things I have done which felt more difficult than they should. However, the vast majority of dealing with structured data using awk really has been a pleasant surprise.

Stratoscope · on May 24, 2018

Awk did have this reputation of hard-to-read one liners, but it was also always possible to write readable code in it much as you would do in any language.

My favorite bit of Awk code that I ever wrote was this print formatting program for the HP LaserJet II:

https://github.com/geary/awk/blob/master/LJPII.AWK

It printed source code in a "2-up" format, with two pages printed side by side in landscape mode. It looked for the kind of separators that I and many people were fond of back in those days (the source code is a good example) and converted those to graphic boxes. And it tried to be smart about things like breaking pages before a comment box, not after.

I wish I still have a sample printout handy. For 300 lines of what I thought was fairly cleanly written Awk code, it made some nicely formatted printouts.

oblio · on May 24, 2018

That's very clean looking code and I think it would be considered so regardless of the language it was written in, IMHO. Nice! :)

coliveira · on May 24, 2018

Awk makes it easy to write one liners, but it is also very easy to save your script to a file and run it with the -f option.

schoen · on May 24, 2018

I think some of the examples in popular documentation use awk -f to read a program from a file, so I don't feel like that's regarded as an unreasonable way to run awk code.

It's true that many programs might well just have a single action (or a single action plus a BEGIN and END), but I'm not sure that resorting to awk -f will give great offense.

asicsp · on May 24, 2018

one-liners are fun though, especially if you like code golfing..

if you're interested, you could give my tutorial(https://github.com/learnbyexample/Command-line-text-processi...) a try

it has over 300 examples..

natecavanaugh · on May 24, 2018

I gotta say, I really enjoyed skimming through this, and saving it for future reference. I can already tell there are some basic things that I should've memorized with all the one liners I've used, but because most were cobbled together for the immediate task (and many online forum searches assume a lot of prior knowledge), so t this presentation, and cookbook style references really help me hold the practical benefits in memory alot longer, so thanks again!

asicsp · on May 24, 2018

thanks, that means a lot to me :)

if you are interested in text processing in general, do check the other chapters for grep/sed/perl/sort/paste/pr/etc

I learned a lot writing them up and marvel at the speed and functionality provided by these tools.. for many cases, it is lot easier to write a bash script combining these tools than to write a Perl/Python/Ruby script..

xpil · on May 24, 2018

Ahh, what a useful link! Bookmarking immediately. I use AWK like once a year but when I do, I usually struggle a lot. Your examples are really helpful!

asicsp · on May 24, 2018

thanks, it's a reference for me as well - not easy to remember all the features and tricks

ageofwant · on May 24, 2018

Thanks for that mate. I'm a regular customer your work is greatly appreciated.

asicsp · on May 24, 2018

thanks for the feedback :) happy learning

pmarin · on May 24, 2018

what I usually do is to wrap the awk script in to a sh function. I never do one-liners unless it is a well known idiom.

wernsey · on May 24, 2018

One thing I also really like about Awk is that you can keep 99% of the language in your head. With Awk it is seldom necessary to even reference the man page. I don't have to interrupt myself with looking up the syntax or the order of the parameters to a function.

gnuvince · on May 24, 2018

Agreed. I learned awk a bit late in my Linux journey (started using Linux in 1999, learned awk in 2016), but I can't imagine not using it daily now. Great tool to manipulate data, and it's great for when you want a Unix tool that doesn't exist (e.g., computing the sum or mean of a stream of numbers).

boobsbr · on May 24, 2018

I never got past using awk for string replacement that I couldn't pull off as a regex in grep...

gitgacutils · on May 24, 2018

awk is a fundamental part of the unix ecosystem. It's in the hall of fame of utils along with the likes of grep. If you use unix or linux in any semi-serious capacity, you are familiar with awk.

Annatar · on May 24, 2018

I use AWK daily; it’s my workhorse programming language and automation tool. Small, fast, no dependencies and portable.

Extremely powerful programming language for big data processing and data driven programming, I often use it to generate shell scripts based on some arbitrary data. With no memory management and providng hash arrays, AWK is an absolute delight to program in. Using it with the functional programming paradigm makes it even more powerful. It blows my mind that Aho, Weinberger and Kernighan managed to design something so versatile and yet so small and fast. I’ve also been using it to replace Python scripts with a ratio of Python:AWK being anywhere from 3:1 to 10:1 as far as lines of code needed to get the same tasks done.

wernsey · on May 24, 2018

I'll just share my biggest Awk-based project here: A tool that creates HTML documentation from Markdown comments in your source code.

It does essentially what Javadoc does, except it uses Markdown to format text, which I find much more pleasing to the eyes when you read source code.

The benefit of doing it in Awk is that if you want to use it in your project, you can just distribute a single script with your source code and add a two lines to your Makefile. Because of the ubiquity of Awk, you never have to worry whether people building your library has the correct tools installed.

It doesn't have all the features that more sophisticated tools like Doxygen provides, but I'm going to keep on doing the documentation for my small hobby projects this way.

[1] https://github.com/wernsey/d.awk

mauvehaus · on May 24, 2018

It is true that there exists an awk most everywhere, but I've been burned a few times by the discovery that nawk (Solaris (dating myself a bit there...)), gawk (many GNU/Linux distros), and mawk (some GNU/Linux distros? Don't know where I ran into it, but I have in the last couple of years) all have subtle incompatibilities around the edges.

As I recall, gawk in particular has some extensions that are explicitly a superset of the standard funcionality. Which is great if you're only targeting gawk and know about it. It's less great if you think you're only targeting gawk, and discover later on that there's a system with a different awk you need to support.

Your project looks neat, by the way. I'm looking forward to taking a closer look.

wernsey · on May 24, 2018

Yes, there are differences. I used only standard awk facilities, and tested my project with gawk, mawk and nawk.

I've also been burned when I discovered that Raspbian ships with an older version of mawk that did something differently in the way it processed regexen that caused my script to break.

kazinator · on May 24, 2018

On the other hand, there is a POSIX standard, which you can use as a rough guide to write portable Awk if you suspect portable Awk is needed.

danmg · on May 24, 2018

Very cool. But, are there environments which have awk that don't have perl5? Even most commercial unices back in the late 90s shipped some kind of perl.

ianmcgowan · on May 24, 2018

I had this exact book and used awk on DOS/Novell in the early 90's when scripting choices were pretty scarce. The writing is tremendous - a model of clarity, and worth reading just for that. Anything with Kernighan, Pike or Plauger as author is worth checking out just for the example of clear thinking.

orionblastar · on May 24, 2018

In 1996 I worked as a federal contractor for a US Army base. They had different Unix systems locked down for security reasons. Had Awk and Sed to work with and ordered the books from Amazon.

Oracle databases and other databases exported data in fixed width files and I had to download from several Nix systems to import into one general Nix system using Oracle and then a DOS based Clipper 5 system and an Access 2.0 Windows system and they all had to get the same results.

If not for Awk I could not filter the files from the Nix systems.

thomastjeffery · on May 29, 2018

Saying "Nix" instead of "*nix" is confusing because of the Nix package manager[0]

[0]https://nixos.org/

znpy · on May 24, 2018

I printed this book and went through it and imho just skimming through all of it is worth it: just understanding how it works beyond the basic '{print $2}' is immensely worth it, and being exposed to some 'advanced' techniques gives you a set of techniques that you can reuse in your daily chores (in particular if you're a sysadmin).

totalperspectiv · on May 24, 2018

This book is worth the read. Just to get in the mindset of the authors. I wish that more programming books could be as concise and useful at the same time.

kazinator · on May 24, 2018

TXR Lisp provides a Lisp-ified awk in a macro:

http://www.nongnu.org/txr/txr-manpage.html#N-000264BC

> "Unlike Awk, the awk macro is a robust, self-contained language feature which can be used anywhere where a TXR Lisp expression is called for, cleanly nests with itself and can produce a return value when done. By contrast, a function in the Awk language, or an action body, cannot instantiate an local Awk processing machine. "

The manual contains a translation of all of the Awk examples from the POSIX standard:

http://www.nongnu.org/txr/txr-manpage.html#N-03D16283

The (-> name form ...) syntax above is scoped to the surrounding awk macro. Like in Awk, the redirection is identified by string. If multiple such expressions appear with the same name, they denote the same stream (within the lexical scope of the awk macro instance to which they belong). These are implicitly kept in a hash table. When the macro terminates (normally or via non-local jump like an exception), these streams are all closed.

3uclid · on May 24, 2018

For context, I'm in university, but during one of my internships, a lot of the older developers always seemed to use awk/sed in really powerful ways. At the same time, I noticed a lot of the younger developers hardly used it.

I'm not sure if it's a generational thing, but I thought that was interesting.

Anyways, are there any good resources to learn awk/sed effectively?

MikeTaylor · on May 24, 2018

Awk really has been superseded by Perl (and therefore arguably by Python, Ruby, etc.) But sed remains a thing of beauty, all its own, and very well worth learning. Hardly a day goes by that I don't use it in some one-off command like

    for i in *.png; do pngtopnm < $i | cjpeg > `echo $i | sed 's/png$/jpeg/'`; done

kazinator · on May 24, 2018

  for i in in *.png ; do
    pngtopnm $i | cjpeg > ${i#.png}.jpeg
  done

olskool · on May 24, 2018

Over the years I've written too many awk one liners to count. Most of them look ugly - hell awk makes Perl look elegant - but having awk in your toolkit means that you don't have to drop out of the shell to extract some weird shit out of a text stream. Thanks Aho Weinberger and Kernigan!

hawski · on May 24, 2018

And I'm still waiting for the structural regular expressions version of awk [0].

I very much like awk, I prefer it over sed, because it's easy to read. Also proper man page is all one needs. But I find myself many times doing something like this:

  match($0, /regex/) {
    x = substr($0, RSTART, RLENGTH)
    if(match(x, /regex2/)) {
      ...
    } else if(match(x, /regex3/)) {
      ...

Then I sometimes want to mix and match those strings. Or do some math on a matched number. It's a bit tedious in awk.

[0] http://doc.cat-v.org/bell_labs/structural_regexps/

nur0n · on May 24, 2018

It seems that work has already started: https://github.com/martanne/vis.

hawski · on May 24, 2018

Correct me if I'm wrong, but as fine as vis is it will not feature stand alone strex-awk.

jph · on May 24, 2018

Awk is great for quick command line scripts and also for running on a very wide range of systems.

I recently wrote a simple statistics tool using Awk to calculate median, variance, deviation, etc. and people say the code is readable and good for seeing the simplicity of Awk.

https://github.com/numcommand/num/blob/master/bin/num

rurban · on May 24, 2018

If you want a fast awk, use mawk.

https://github.com/mikebrennan000/mawk-2

davidgould · on May 25, 2018

Yes. mawk is shockingly fast.

In my perfect world mawk would have some of the gawk extensions, and it would have a csv reader mode to properly split csv into $1...$NF. Because that would be the killer tool.

rurban · on May 25, 2018

The new mawk has it, just the old Debian version not yet.

davidgould · on June 1, 2018

Where is the new mawk please?

Annatar · on May 24, 2018

...Or transpile it to ANSI C with AWKA.

JepZ · on May 24, 2018

That is a nice book. Starting with a practical tutorial and going into the structure and language features afterwards on a reasonable page count of just about 200 pages.

I like to use awk when I need something a little more powerful than grep. Nevertheless, when I look at the examples and where the book is heading I prefer R for many of the tasks (in particular Rscript with a shebang).

Just to give an example: If you have to manipulate a CSV file, that would most certainly be possible with awk, but some day there might be a record which does contain the separator and your program will produce some garbage. R on the other hand comes with sophisticated algorithms to handle CSV file correct.

I truly respect awk for what it was and is but I also think that the use-cases where it is the best tool for the job has become very narrow over time.

samuell · on May 24, 2018

As I do most of my daily work in cheminformatics with a (shell-based) workflow engine (http://scipipe.org), awk has turned out to be the perfect way of defining even quite complicated components based on just a shell command. These days, pretty much 50% of my components are CSV/TDV data munging with awk! :D

(Can be hard to explain how this works without an image, so an (older) image is found in: https://twitter.com/smllmp/status/984173696448434176 )

hi41 · on May 24, 2018

I find awk so beautiful. I written many scripts in awk. It is so good at data transformation. I used it write a script to delete old and unused records from tables. The book is so beautifully written with amazing clarity of thought.

oblio · on May 24, 2018

I recently had a sort of "contest" with someone for parsing the output of a tool. I had to parse some text output into a tree structure.

The other person wrote it in awk, quite quickly. After writing my own version in Python (my version was waaaay over-engineered), I decided to blatantly rip-off the awk solution and re-implement it in Python.

It was almost as simple and as short.

Awk is much more compact as a language, but also way more limited. And it still has its quirks and a certain volume of information you have to gather. I'd say it's more worthwhile to learn Python instead, because you'll be able to use it for other purposes.

thomastjeffery · on May 29, 2018

From the introduction to Chapter 2:

> Because it's a description of the complete language, the material is detailed, so we recommend that you skim it, then come back as necessary to check up on details.

Any book that recommends skimming is doing something right.

dokem · on May 24, 2018

This pdf looks like a scanned book but I can highlight and copy text from it? What exactly is going on here? Does Chrome pdf viewer have built-in OCR?

fulafel · on May 24, 2018

PDF has long had this feature, most/all readers support it. There is a hidden textual representation included along with the scanned shown content.

dokem · on May 24, 2018

Interesting, why not just replace the text then? Where can I find more info about this? I was actually trying to find a good source of info about PDF's recently and couldn't really find much.

fulafel · on May 25, 2018

Replacing the typeset text with any reasonable fidelity seems like a much harder problem than reproducing the scan and providing the ocr'ed text content. It might still be a good idea to do, maybe some software does this.

I don't have any references, sorry.

romwell · on May 25, 2018

>Interesting, why not just replace the text then?

Why would one do that? It's destructive. A typeset book is much more than a text file.

Not to mention OCR is not perfect, especially when math / special symbols are involved.

mbubb · on May 25, 2018

The lovely and humbling thing about this it was written 3 decades ago and the examples still work. Makes me think of another short elegant piece by Kenneth Church (?) called "Unix for Poets" which shows how to use core UNIX utils to work with text. Also from the mid to late 80s. Perl may have replaced sed and awk but they endure.

asicsp · on May 24, 2018

See also: http://www.faqs.org/faqs/computer-lang/awk/faq/ from 2002

For latest manual/book: https://www.gnu.org/software/gawk/manual/

segmondy · on May 24, 2018

I've this book and highly recommend. I have referenced it numerous times to pull out text manipulation wizardly that stunned others.

technofiend · on May 24, 2018

Lol. This is the IT equivalent of the hairy dog story about the itemized million dollar invoice: $1 - drill a hole, $999,999 - knowing where to drill the hole.

And by that I mean sometimes it seems easy to solve a problem because you have the skill to do so. It looks easy but that's only because of the time invested in making it easy for you. For anyone else the challenge remains.

excitom · on May 24, 2018

I always found it interesting that the Awk paradigm is also the basis for IBM's RPG language. Two very different environments coming up with basically the same elegant solution for the same problem:

1. Run zero or more setup operations.

2. Loop over the lines of a text file and process its columns into an output format.

3. Run zero or more cleanup operations at the end.

gerbilly · on May 24, 2018

If you want a binary alternative to awk, try using lex.[1]

You can feed in regexps and c code fragments and it will generate c code for you.

[1] https://www.tldp.org/HOWTO/Lex-YACC-HOWTO-3.html

_zn02 · on May 24, 2018

I wanted to know what the language looked like so I went to the first example in the book and found this:

  This is the kind of job that awk is meant for, so it's easy. Just type this command line:
  awk '$3 > 0 { print $1, $2 * $3 }' emp.data

vasili111 · on May 24, 2018

Lots of comments about awk, perl and sed for text proccessing. What about tcl?

wenc · on May 24, 2018

Recycling an older HN discussion on Awk vs Perl

https://news.ycombinator.com/item?id=14647022

carlmr · on May 24, 2018

I just skimmed it in 30 minutes. I feel I can write some simple stuff now. Except for all the examples it doesn't feel that overwhelming.

ulzeraj · on May 24, 2018

I was joking with my coworker some weeks ago about how awk is condemned to be forever used as a cut replacement.

rbc · on May 24, 2018

I have this book. I use awk daily to do analysis of Suricata logs. It's great for querying structured text.

mxschumacher · on May 24, 2018

I love the typography

kps · on May 24, 2018

That's troff at work.

tux1968 · on May 24, 2018

This has bothered me for a long time. It looks like i'm seeing something other than what other people are seeing. So many of these documents just look awful. In this particular case it does not look good either:

http://i.imgur.com/e11d0aK.png

http://i.imgur.com/0Ysr7QQ.png

Look at the kerning on "Awk", it's not good. And look at the zoomed in version, the characters all have pixelisation and jaggies.

These were just viewed using Firefox's default pdf viewer. Is there a way to view them and see better a quality version of the document?

kps · on May 24, 2018

Given that this is a scan of a book produced on a phototypesetter (no pixels), you probably want a real copy.

tux1968 · on May 24, 2018

Yeah, it was just surprising to me that people were praising this document.

lasermike026 · on May 24, 2018

Awk has it's uses. If you use the command line you'll probably use Awk occasionally.

I don't get the Perl hate. Perl's unpopularity may have something to do with some of the languages design choices. I think what really killed it was Perl coders. Some of the worst code I've seen happened to be written in Perl. If you follow clean code principles Perl is fine. Mojolicious is an awesome framework. I like it a lot.

Today I code Python and C. I used to code Ruby and before that Perl. I loved Ruby's syntax but Ruby seems to be waning. I'm looking forward to coding in Go. I'll be coding Javascript but I'm not looking forward to it.

Use the tool that fits the job. I have no loyalties to any programming language.

snaky · on May 24, 2018

>Perl's unpopularity

Is a myth actually.

> Tom Radcliffe, recently presented a talk at YAPC North America titled “The Perl Paradox.” This concept refers to the fact that Perl has become virtually invisible in recent years while remaining one of the most important and critical modern programming languages. Little, if any, media attention has been paid to it, despite being ubiquitous in the backrooms of the enterprise.

> Yet at ActiveState, we have seen our Perl business continue to grow and thrive. Increasingly, our customers tell us that not only are they using more Perl but they’re doing more sophisticated things with it. Perl itself recently made it back into the Top 10 of the Tiobe rankings, and it remains one of the highest paying technologies. Therein lies the paradox.

https://www.activestate.com/blog/2016/07/perl-paradox

zzzcpan · on May 24, 2018

At the bottom of top 20 now and pretty much no job openings in most countries.

Verdex_3 · on May 24, 2018

With perl's use cases, I would be a bit surprised to see a job specifically for programming in perl. It's a tool that you use to support your other infrastructure. It's not a tool that you generally use to build up that infrastructure.

Similarly, I don't expect to see many jobs openings for wrenchers, but I fully expect a mechanic being hired someplace to be able to use a wrench.

kazinator · on May 24, 2018

> see a job specifically for programming in perl

That was exactly seen some twenty years ago. Perl as used to build infrastructure, like entire back-ends for sites and whatever.

You don't see those jobs anymore.

purerandomness · on May 24, 2018

There are generally no job openings for engineers anywhere, because they all rely on headhunters and recruiters.

Try https://perl.careers/ for example.

I first heard of them from this slide deck that any developer, regardless of tech stack, should read: https://de.slideshare.net/perlcareers/how-to-write-a-develop...

__sr__ · on May 24, 2018

> Awk has it's uses. If you use the command line you'll probably use Awk occasionally.

Agreed -- I use awk all the time in shell pipeline.

> I don't get the Perl hate.

The problem is not with writing Perl -- I don't mind using Perl to write scripts and tools. In fact, I like it better than Python for writing glue code.

Reading Perl code -- especially if the code base is old and has been worked upon by multiple people -- now that is a real pain. Don't get me wrong, I have seen some really well written Perl code and have had the good fortune of working with some really smart Perl programmers. But the majority of Perl code I have encountered has been unreadable mess that makes me want to pull my hair out. There are times when I feel it is more productive to rewrite the code in Python than to spend time on the existing code.

> If you follow clean code principles Perl is fine.

Agreed -- except most Perl coders don't. Worse, they make a large chunk of people who contributed to CPAN -- something which was one of the major reasons behind the popularity of Perl.

> Use the tool that fits the job. I have no loyalties to any programming language.

Couldn't agree more.

Koshkin · on May 24, 2018

Yes, Perl is widely known as a write-only programming language.

zzzcpan · on May 24, 2018

And yet most Perl code was never hard to read and understand, except for stuff like OOP, but OOP is hard in any language. In fact push for OOP is probably one of the notable things that contributed to Perl's decline.

Koshkin · on May 24, 2018

> except for stuff like OOP

That is funny, because one of objectives of OOP was to give code a better structure and therefore make it easier to understand.

Annatar · on May 24, 2018

Correct, because good luck trying to figure out what the hell someone else's Perl program is doing later.

I used to spend up to 40 hours in a Perl debugger trying to figure out what the program is doing, and after going 25 layers deep into the call stack, I'd come out one week later none the wiser as to what that damn spaghetti code was doing. Tracking down any bug would always turn into debugging of epic proportions. I never had such problems debugging machine code!

That is why Perl gets so much hate.

My favorite construct to hate was going through the logic and just as I was about finished trying to understand the state machine at that point hitting an unless{} clause which instantaneously wipes the slate clean. Oh how I hate Perl.

dig1 · on May 24, 2018

> Some of the worst code I've seen happened to be written in Perl

Second that. Worked with a guy who used (and adored) Perl for more than 15 years. His C++ code was so incomprehensible I had to rewrite many things in my spare time.

baldfat · on May 24, 2018

> I don't get the Perl hate

I would say hate is a strong word but I have 2 issues with Perl:

1) Regex choices made

2) Readability, if someone wrote something in Perl I normally would rewrite it if it took me less time then figuring out what they wrote and wasn't working as expected. Maybe it was just my poor skills but man Perl can be hard to tell what is actually going on.