I consider awk to be the most useful and underused language in the UNIX ecosystem. I use it daily to analyze, transform, and assemble data, and it always blows my mind that so few people really know how to use it at a decent level. This is an excellent book to give a real idea of what awk is capable of.
> it always blows my mind that so few people really know how to use it at a decent level
Not nearly as surprising as it is to me that now that most developers have forgotten perl, they're turning to awk as an inspiring example of a bygone era.
Seriously: perl basically replaced awk in the mid-90's. It absorbed all the great lessons and added two or three dozen innovations. But it had scary syntax, so everyone used python (which did not replace awk very effectively) and forgot perl. So now we're back at awk. And the scary syntax is all in the Rust world.
Perl deprecates awk & sed [0]. Whereas Python is a bad substitute for those, and because of that, we're back on Awk. I have the feeling that Python simply succeeded as it seems to be a language designed for people that do not enjoy programming, which sadly seem to be the majority in our industry.
The Perl hate is so prevalent in most companies, that you can get into serious issues should you even write a one-liner in it. So here I am, reluctantly using awk/sed and asking too much of Bash/Python.
Ooof. I have to respectfully disagree that perl deprecates awk. If I'm writing awk, it's usually because what I want to do is precisely what awk does by default: read a line, match it against a condition, and perform some action.
I could do all that in perl, but it's more stuff I'd need to dig up every time I wanted to do something awk-like. People who code perl regularly already have it, for sure. It's a barrier for those of us who don't, and one that awk doesn't have.
Awk isn't perfect, for sure, but I've transitioned from writing simple text processing in perl to doing it in awk because awk provides the basic framework that I'd end up doing ad-hoc and non-idiomatically in perl every time.
Again, all this with that caveat that I'm not a perl programmer. I used it for a couple of semesters in college, which was enough to be dangerous, but not truly proficient.
absolutely shell. Perl acts as glue between multiple processes very nicely. $foo = `foo_command`; is great. Opening named pipes. Python is more awkward because it's less like shell.
If one writes a Perl one-liner where AWK will do, that’s avarice.
I used to maintain and debug a relatively large and an extremely complex build engine in Perl for a living for several years, and there was no construct in there that could not have been easily written in AWK.
Consider which language is sold to people-off-the-street as "How to Program!" it's python. Most of these people do not enjoy the cognitive effort, detailed typing and symbolic work, algorithmic thinking, architectural thinking, linguistic thinking that is "programming". At best, the enjoy the end result, but not the process.
Python taught in this way at least, fools people into thinking that programming is about something far simpler and more toylike than it is. Many "hard" disciplines do this: sell children and lay people on toy experiments that momentarily captive but have no relationship to what a practitioner of that discipline does.
The older system of recruitment would be to polarized hard, ie., to throw people in at the deep-end to scare off all the people who would waste time/resources training. Who remained really wanted to do that thing (eg. physics, programming, ...).
Today we're doing something perhaps vaguely immoral: selling people on a career that has no relationship to the sales pitch.
To me, that sounds like gatekeeping - as in "it's only programming when it's insanely difficult".
Somebody playing football in a Sunday pub league is still playing football, and still loves playing football, even if they're not at the same standard of Lionel Messi.
For me at least, I feel the same about programming. I love understanding somebody's problem, and building a solution for them that solves that problem. I normally use PHP (and sometimes Python or Ruby or JavaScript) because they make it easier for me to focus on the problem, rather than language details. I can't always solve the problem because it's too difficult, and perhaps some of my solutions are not 'optimal'. But I feel hurt by the idea that because I don't have a strong understanding of how Python works at a really deep level, I'm not a real programmer.
I also think that's a great way to piss people off who are just getting into the industry, and may one day become great programmers - even Linus Torvalds was a junior once. I'd encourage them to keep going, keep learning, and keep helping people solve their problems (and getting paid good money for doing that).
Its only gate-keeping if people want to be past the gate. I'm not talking about deterring people who are interested.
The vast majority of people do not want to be programmers and would not enjoy programming. Delaying the moment the actually have to do something difficult with a programming language is not especially heathly.
How would you feel about a career in football sold to you on the basis of table pong? Keep playing the pong, and then one day, you're face is in the dirt and you drop out.
The self-esteem hack psychology of the 60s-90s equivocated encouraging people with lying to them, as-if the only way we can get programmers is by lying about what programming is about. This isnt encouraging anyone, it's lying to them.
I think you’re confusing “programmer” with “10x programmer”. Plenty of people are capable of implementing business logic in code. Very few are capable of designing that logic — they’re the 10x programmers who know all about data structures, algorithms, etc.
You can’t run a business expecting every employee to be a rockstar. It just doesn’t scale. So you skill it down and put the high-skill people where they can have the most impact.
It sounds like we have a different definition of programming. For me, programming is producing code that gets executed by another piece of hardware or software.
The complexity of the code is not part of the definition. Nor is your understanding of the hardware/software involved.
Perl isn’t difficult in the same way that the English lnguage isn’t difficult. Easy to get started with, takes a long time to master. Fortunately for us die hard perl types it’s quite capable of cleanly solving all normal dynamic language problems, and some seriously abnormal ones. And because of the insanely good backcompat in perl we’ll see the pendulum swing back to perl 5 some time in the next decade.
When I did my comp sci undergrad degree, languages per-se was not part of what was taught in the classroom. Whatever language that class was using, be it assembler, scheme, C or some other higher level language, it was up to the student to figure out the syntax, how to run the compiler, etc. You got some help and basic examples in the discussion sessions but it was not part of the lectures. And there was no web, no stack overflow, no google at the time. You had a book about the language, some local newsgroups (and the larger USENET) and that was it.
If you didn't enjoy reading, puzzling, and banging your head against the wall you would wash out after a couple of classes.
What language do you think beginners should be learning that would not keep them from what you say is ‘cognitive effort, detailed typing and symbolic work, algorithmic thinking, architectural thinking, linguistic thinking’. Perl? And why is Python incompatible with those things?
Like many of us, I started programming with BASIC, not C or assembly, but I turned out fine. I don’t think people need to or necessarily can absorb all the high level details straight off. To me python seems well organized enough that it makes a great language for beginners.
As far as the second part of your comment, I think if you make topics like this impenetrably difficult and complex at first, it will not only filter out the non-dedicated, but also those who don’t yet know that they would like to be dedicated to it.
Note that it sounds like a snide remark but it doesn't have to be. One could interpret it as, "Do you want to simply get things done, and not bicker endlessly about microoptimizations or where the braces go? Python is for people who simply want to complete their tasks and then go home and spend their time doing more stuff they think is fun."
It's not necessarily a bad thing to be productive, instead of inventing new problems for yourself to keep having programming things to do.
(Note that none of this necessarily reflects my personal opinion; it is just an alternative way to read the grandparent.)
It's disgusting how much we all rely on intuition and gut feelings when evaluating large swathes of technology. The internet hates perl so people criticize it without even using it. People will use what everybody else is using without actually trying out the options. There's too much information and so we must go with what others have said, and it all becomes hearsay. Keeping up is more like wizardry than engineering. Go with what the crowd says because I can't possibly install all of those libraries, play with the examples, and give my own evaluation. Hey look a new js framework just came out...
I used it quite a bit, and wrote some applications in it that are still in use a decade later. The "write-only" aspect of Perl is real -- it enables many styles and idioms, and as a result, tends to requires reading knowledge of all of them. It also has some unusual design decisions (list vs. scalar context) and some hacks ("bless") that do not contribute to readability, especially for people who do not get to use Perl all day long. Python is a lot more readable, and does most of the same stuff.
What Perl did that was amazing was bring regular expressions "to the masses", and Perl-compatible regular expressions (pcre) are still the defacto standard that most subsequent libraries have used (more or less).
"The internet" is an abstraction and doesn't hate (or love) anything. That itself is the kind of gross generalization you are criticizing. And one can criticize a language and still have respect for it.
Let's be honest -- people only use what everyone else is using because it lowers the bar to getting hired to 90+% of programming jobs.
I can count the number of tool-agnostic development teams that I've met on one hand. Many more have claimed they are when they are not.
If you aren't aiming for the top 10% of jobs (vague quality metric that you can interpret as you wish), then you want to have above-average knowledge of _just_ Python, Go, React, Docker and Kubernetes.
The situation only changes when high profile current/ex-Googlers (or similar) start talking about a language/tool a lot. Then the mass hops on board that train too.
I agree with the thrust of your comment but not the specifics. There are still a ton of Java and JavaScript (not necessarily React) jobs out there in the "bottom 90%".
And I don't think "high-profile" people talking has much effect. Paul Graham talked up lisp for a while, and I was certainly interested (I like lisps), but... there are still precious few jobs that use Lisp. Most people learn technologies that they either need currently or are in use in jobs they know of and might conceivably get.
Even at shops just hiring for Java or JavaScript, I don't think knowing them is a career advantage. They don't make you more hirable than you would be otherwise. That's really the only reason I left those out.
Paul Graham, while a thought-leader of sorts, isn't generally thought of as someone working at either the edge of tech or in large-scale systems. That's why nobody wants to chase the tech he's using vs what Google/Facebook/etc are.
There's something like necessary complexity you can't easily abstract away. I find Rust does a fine job at cleaning up syntax. I'm not a fan of snake_case. Other than that I can't think of anything that's more difficult than the underlying concept in Rust. And it's still close to C (braces and functions) and ML syntax (type after colon and variable name, let bindings) in many ways.
Especially compared with a similarly complex language like C++. Now that is scary syntax, if you're not used to it from 20 years of using C++ and developing Stockholm Syndrome.
Oh, don't be silly. Exactly the same point you're making about rust can be made about both C++ and perl by expert practitioners. Syntactic complexity is linear with expressive power, that's why it's complex to begin with. You just "like" rust, so you view it as a good tradeoff there and not in other languages that you "dislike".
My point above was that this decision is basically one of fashion and not technology. And the proof is how the general consensus about awk has evolved over 20 years as perl has declined.
Are you confusing syntax with grammar? Rust has a large grammar—many reserved words, many compile-time macros, etc.—but not too much in the way of syntax (e.g. novel punctuational operators; novel kinds of literals; etc.)
C++ and Perl, meanwhile, both have tons and tons of syntax, such that they're 1. harder to grasp for people who haven't seen them before, and 2. harder to learn (especially by attempting to Google language features "by name.")
If there was a spectrum with Lisp [or Forth] on one end and APL on the other, Rust might be somewhere right-of-center... but it'd still be pretty far left of C++ and Perl.
Also, given the languages that occupy the ends of said spectrum, I think it should be clear that your position on said spectrum has no correspondence with "expressive power" :)
Awk has evolved? GNU Awk has---somewhat. It has a clumsy facility in the place of a proper FFI called the "extension API" for binding to C libraries. You need to compile C code to use it, and the API has been a moving target. It has a way to work with XML. It has an "in place" editing mode, and other things like bignum integer support via GMP (requiring the -M option to be used). Plus various minor extensions to POSIX, like full regex support in RS (record separator), a way to tokenize fields positively rather than delimit by matching their separators, a way to extract fixed-width fields, an @include mechanism, a way to indirect on functions and such. None of it adds up to very much.
I like C++, but it's not the language that you would design if you hadn't accumulated so much cruft over the years. Rust didn't have to support C compatibility and compatibility with earlier C++ standards. So Rust could be designed properly for modern use cases.
There's necessary complexity to express certain concepts, but C++ has accumulated a lot of unnecessary complexity over the years.
Rust, in terms of GC-less languages with mostly zero-cost abstractions is the simplest language that I've seen. And having a GCless language with memory safety is not just fashion. It's pretty much the greatest single advancement in language design since the GC itself.
> Syntactic complexity is linear with expressive power
Have you taken a serious look at a lisp? You might be pleasently surprised. Everything that can be said about lisp has probably already been said, but I'd argue that sexprs have a much higher expressive/complexity ratio than say the C++ grammar.
Second this. Perl replaced both awk and sed. Data scientists marvel at awk/sed but Perl is somehow forgotten. Perl May not be as suited to writing complex programs, but for text processing tasks, it is much more elegant than awk or sed.
That's the first time I'm hearing Perl being praised for its elegance of all things. Elegance is certainly in the eye of the beholder, but by default is understood in the context of programming languages as "containing only a minimal amount of syntax constructs". By that measure, Perl is spectacularly/absurdly bad with its "sigills" and "there's more than one way" idioms. In fact, I find Perl one of the ugliest languages of all time.
Edit: a-ok, probably elegance is meant in the same sense that C is elegant by cramming everything in a single statement using pre-/post-increment operators and assignments-as-expressions
One has to recall that there were some forces that wanted Perl to become a standard shell rather than a programming language. A shell is usually more limited in features, but it is frequently very forgiving, provides many shortcuts, and there are often multiple ways of doing things.
However, I've never believed it to be possible to have a language both as a shell language and a proper programming language for large-scale projects. I believe the two usecases are fundamentally antithetical, but I'd be happy to be proven wrong.
> However, I've never believed it to be possible to have a language both as a shell language and a proper programming language for large-scale projects. I believe the two usecases are fundamentally antithetical, but I'd be happy to be proven wrong.
I'd say Powershell proves you right. Powershell has a great design, it has optional typing and access to a cornucopia of libraries via .NET.
Even so, they had to make some compromises because of the shell parts (functions return output, for example) which makes is quite finicky as a "proper" programming language.
On the shell side, the very nice nomenclature which makes it very readable and discoverable makes is annoying sometimes to use as a shell. That and the somewhat unwieldy launch of non-Powershell commands.
Someone who attempts to bridge the two has a ton of work to do, both in the research and in the implementation department. I guess Oil Shell (https://www.oilshell.org/) is the most realistic approach we have today. And it's probably still 1-2 years away from release and many more years from mass adoption (if that ever happens).
yeah.. I like the `s///` syntax in sed/perl than the `sub/gsub` syntax.. plus the regex is lacking in awk.. no backreference(gawk provides, but only in replacement section).. and perl has non-greedy, lookarounds, code in replacement section, etc
other nice features I'd like in awk is `tr` and `join`
Why add 'tr' and 'join' to awk when they exist on their own?
That's part of why people avoid perl. It's very capable, but that wide scope is counter to the unix philosophy that prefers simple, focused utilities that can be combined in pipelines.
you could ask why have sub/gsub when there is sed... that's because you need that for specific field or string in addition to other processing.. similarly, having tr for specific string/field is useful..
I meant join as in perl's join - to construct a string out of array values with specified separator
I think perls niche is sort of it's downfall. For example, I used to work at a company with 70% C, 25% shell script, and 5% perl. Any time I ran into a perl script I had to switch my brain into Perl mode, with the understanding that what I was working on would be just as good or better in C or shell. I had nothing against Perl as a language, but always enjoyed exorcising a perl script from the codebase.
What I like about AWK is that it is described by a few pages of the POSIX standard. If Perl was that ubiquitous and simple, I would prefer it over AWK.
To be clear: I like that about it too. But in the real world, we want tooling that does more stuff. In the mid-90's, "everyone" knew perl, because it was the best choice for problems in this space. And in a world where everyone knows perl, there is no place for awk.
But now we live in a world where no one knows perl, and languages like python and javascript are clumsy and weird in this space. And that makes awk look clever and elegant.
All I'm saying is that in perl-world (which was a real place!), awk wasn't clever and elegant, it was stale and primitive. And that to me is more surprising than awk's cleverness.
I think people underestimate the importance of medium-powered tools. Perl is great, but awks limitations make it easier to write in a way that the next person can maintain.
Eventually the cycle will repeat again. The moment you will have non trivial text work to do, awk will have to give way to Perl.
The rise of Python was really use case for dealing with standard interfaces like DBs/XMLs/JSONs becoming common. Python hasn't actually replaced Perl in any meaningful way.
Most modern distributions write their tools in Python instead of Perl. Proof: the really long transition process Fedora (and therefore RHEL), Debian and Ubuntu went through to migrate from Python 2 to 3. They could have done it faster if not for their system tools written in Python 2.
In the web space, Python is not huge, but it definitely supplanted the niche Perl used to have. No one I know writes web stuff in Perl anymore.
I totally see your point. And in Perl-world, I would probably use Perl too – I mean, if both tools are equally ubiquitous, why not use the most powerful one?
I entered the field at the tail end of the Perl era, so I've only toyed with it a long time ago.
I hated perl in the mid-90's, and stuck with grep/awk/sed intead, because it was a little more documented-structured.
Meanwhile, if you randomly mashed on your keyboard, it would output a perl script.
I jumped to Python as soon as I found out about it in the late 90's, because it was exactly what I was looking for in self-documenting structuring. It was great for creating parsers with state-machines. I was also the user of my scripts, and I didn't want to have to relearn what I coded a year or so earlier. Python let me pick that up, and to a lesser extent, awk. State machine programming was really self-documenting in Python.
That is one thing I like better in Python than in Perl. In Perl, I was having difficulty with nested data structures and then realized this was something I did in Python daily without even knowing it was a thing on a conscious level. Who hasn't made some weird dictionary with its values being lists or tuples or something like that?
This is exactly backwards to my eyes. Perl's autovivification behavior (assigning through a empty reference to something autocreates the object of the right type) makes nested data much, much cleaner.
Who hasn't made some weird dictionary with its values being lists and forgotten to check for and write code to create it when it doesn't exist, and had to fix that later after a runtime crash? In perl that's literally covered by the one-line assignment you used to set the field you parsed.
This is why it's sad that everyone's forgotten perl.
Defaultdicts to the rescue! But I think it should be an explicit choice. If you only intend to have lists at some keys, and then accidentally mistype a key, it shouldn't (in my opinion) silently create a list, effectively hiding the bug.
Most of awk's 'advanced and hard to do' use cases in Perl go like
open (my $FILEHANDLE, '<', $file) or
die "Cannot open $file\n";
while(<FILEHANDLE>) {
chomp;
#Do you stuff here
}
close(FILEHANDLE);
Also perl can do various other things awk just can't. For example removing something from a file and then talking to a database or a web service, or do other stuff like parse a JSON or XML. Deal with multiple files, or other advanced use cases. Unicode work, advanced regexes etc etc.
In fact the whole point of Perl was Larry wall reaching the upper limits of what one could do with awk, sed and other utilities and having to use them all over the place in C. Then realizing there was a use case for a whole new language.
Correct me if I'm wrong, but this will only be line based won't it? As far as I can tell there is no equivalent of FS/RS/OFS/ORS that make awk the record (not line) based language it is.
Perl simply isn’t worth it. In all the decades of programming, I’ve yet to run into a problem which could only be solved in Perl because it couldn’t be done in AWK.
And I would hereby like to remind you that every computing problem is fundamentally an input-output problem, and because of this intrinsic property, it is possible to reduce all problems in computing to input-processing-output.
Which is exactly the kind of problem AWK is designed to address.
And AWK doesn’t work with lines, it works on records, for which the fathers of the language cleverly chose the default of ‘\n’, which is reconfigurable.
"In all the decades of programming, I've yet to run into a problem which could only be solved in Perl because it couldn't be done in AWK."
Have you ever pondered the number of projects that, in addition to sh, make and common base utilities, require perl during compilation where awk could have sufficed?
As a single example, have you looked at compiling openssl without perl, using awk instead?
Whenever I see a perl prerequisite I question whether it is truly a requirement or whether other base utilities1 such as awk could replace it.
Assuming it could be removed, how much effort is one willing to expend in order to extinguish a perl dependency?
1. Some OS projects like OpenBSD make perl a base utility.
The illumos project undertook a massive effort to eradicate any and all dependencies on Perl (which had also been made part of the core operating system decades prior, by themselves no less). While they're still at it, they have managed to rip out most of it and replace it with binary executables in C or shell.
Yes, writing a build engine in AWK would be perfectly doable, but the right tool for that job is Make.
But that's not the end of it. The whole point of Perl is avoid a salad of C and shell utils. Also in many cases the moment you have to deal with >2 files at a time shell utilities begin to show their limits.
The resulting code is often far more unreadable than anything you will ever write in Perl.
I’m sorry you believe that, but it’s simply not true, especially since UNIX is designed to be driven by a combination of binary code and shell executables. Perl is a mess whose syntax borders on hieroglyphic once poured into software. That doesn’t mean that it’s not a capable language; but it’s not a language in which written programs are maintainable or easily debuggable.
That, and awk programs are almost perfectly portable across awk implementations: nawk, gawk, mawk (can't vouch for busybox-awk as I haven't tested it, though I've heard it's good as well).
This is of practical importance for portable scripts, since Debian uses mawk [1], RH ships gawk, and Mac OS uses nawk.
[1] mawk has recently received new commits by its original author after a hiatus of 20 years or so; see https://github.com/mikebrennan000
AWK seems far more modern than Perl, though. Defining a function with you know, "function" and actually having parameters with names feels like a 21st century language. Calling things "sub" and having it work off a an array called @_ doesn't. Yes, I know there are packages to add named parameters to Perl, plus Perl6 has them out of the box, but it is weird that things went backwards in a lot of ways when Perl replaced AWK in the mid 1990s.
Until you find that it has no local variables other than function parameters. In a function, local variables can be simulated by defining additional arguments ... and not passing those. Everything that isn't a function parameter is a global!
Awk is stupidly confused whether it is a two-namespace or one-namespace language (think Lisp-2 vs. Lisp-1). For instance, this works:
function x()
{
print "x called"
}
function foo(x)
{
x() # function x, not parameter x.
}
but in other situations the two spaces are conflated. For instance sin = 3 is a syntax error because sin is the built-in sinusoid function, even though you're using it as a variable, which shouldn't interfere with function use like sin(3.14).
Actually what you show is that there is a clear rule. First, the symbol is treated as a function. If it is not in the function space, then it is used as a variable name. To make a finer distinction awk would need some form of local variable declaration, which it clearly hasn't.
> First, the symbol is treated as a function. If it is not in the function space, then it is used as a variable name.
That is simply not the case. x() unambiguously treats x as a function, and will fail if there is no such function, even if there is a variable x.
$ awk 'function foo(x)
{
x()
}
BEGIN { foo(42) }'
awk: cmd. line:2: fatal: function `x' not defined
> To make a finer distinction awk would need some form of local variable declaration, which it clearly hasn't.
Also not the case. The purely syntactic context distinguishes whether or not the identifier is being used as a variable or function. Awk sometimes uses it, sometimes not. This doesn't diagnose:
function x()
{
}
function foo(x)
{
x() # function call allowed
x = 3 # assignment allowed
}
BEGIN { foo(42) }
But:
function foo(x)
{
}
BEGIN { foo = 3 }
fatal: function `foo' called with space between name and `(',
or used as a variable or an array
Why isn't it a problem that the function x() is used as a variable?
And Perl is 1000% better than awk/sed when my company got switched to unix based systems we all got sent on a weeks crash course this included sed and awk.
I have only used sed once to edit the passwd file on an early linix system when vi wasn't installed and awk never. Though having being trained in sed and awk helped with picking up Perl
Python was never intended to replace Perl. Perl was designed to extract stuff from text files. Python was designed as a scripting language for system programming. IM(Seldom Humble)O Python beat Perl for two reasons (a) batteries included (CPAN became a clusterfuck) (b) C API lead to stuff like SciPy, NumPy and Pandas.
FWIW I've used both Perl and Python professionally and Python rules.
Perl is a little older than Python. Most of the ideology of Python was a reaction against Perl (https://wiki.python.org/moin/TOOWTDI). Python has always tried to be as good as Perl on everything. I use very much Perl for what it was intended: parsing files (often log files and configuration files), producing reports and launching commands (shell replacement). I have tried to use Python for the same tasks, in particular when some of the files were in xml (xml parsing in Python is nicer than in Perl). Regular expression usage is easier in Perl where multithreading is easier in Python. IMHO, the main handicap of Python vs Perl is the lack of autovivification. Python is very good for teaching (whiteboard interview) or as a scripting language around some big libraries (like tensorflow). At my work, the small glue scripts are almost always in shell or in Perl. The Python applications are being rewritten in Java because of maintenance issues (mainly caused by lack of proper typing). Python does not rule here.
You might want to try Ruby for some of those things you reached to Python for. It takes direct inspiration from Perl and does a lot of those things better, IMO.
Nokogiri is hands-down the best tool for dealing with XML that there is.
I have much the same experience with Python and anything significant developed in Python here ends up getting rewritten in Go.
Not to move the post, but that's part of what makes python so great to me. It reads like pseudocode if written with that goal in mind. Its more like having a conversation with the computer. When I need performance, I now have the algorithm written out in an easy to parse way. Then small parts in golang/C are far faster to write, because I can feel out the whole program.
Im not directly in the software industry though, just writing programs for data analysis in quality and safety programs. I'm sure once you can think directly in something like Go, it'd be faster to write that program first. But it decreases the cognitive load for me.
No worries -- I actually agree with you. We actually don't do very much scripting at my company in general -- Java, Go and JS (for Lambda) are roughly what we've standardized on.
We have a bunch of Python scripts for operations work and it's rare that performance becomes a major concern (...except in/regarding Ansible). Development isn't my team's primary responsibility so this state of things is fine -- our SysEng team can grok Python pretty well, whereas with other languages I wouldn't say this is true.
That is probably true with regard to its adoption, but, as a Perl fan and user, I have to say that in addition, the limitations of Perl 5's syntax begin to show once you get in to collections of collections, and passing them to functions. If you are doing this sort of thing every day, it becomes second nature, but as a casual, ad-hoc user, it is no longer in my working memory.
Maybe Perl 6 fixes these things, but learning it is too far down on my to-do list, where it sits just below Ruby.
If I have a problem that can be solved by looping through the lines of a text file and applying some combination of regular expression matching and substitution, the split and join functions, simple arrays and hash tables, I reach for Perl 5.
The reason I personally left Perl was the extremely hostile community. When someone asked a simple question, not only would they get an overhaul and namecalling, but so would anyone that tried to help them.
Python seemed to do a much better job at onboarding new people. To me, that seems like the most important reason they won in the long run.
that is a very abnormal experience. Larry Wall has personally helped me with at least two junior issues over the years by just randomly answering in general forums.
I don't have links handy, but there were official blog posts and such about how they endeavored to fix the problem. Of course I'm not saying it was universal, nor trying to take away from the fact that there are a lot of great people in the Perl community (PerlMonks is a good example)
They did fix it, as far as I can tell anyway, but it was too little too late.
Did it?
Not used Perl for a while but I never needed to worry about backward compatibility and versions with CPAN, while I find I do with Python (to be fair I am programming more complex stuff than I did in my Perl days).
I've used Python for years and a little Perl. CPAN works pretty well when used with ActiveState...not sure about Unix. Python on Windows stinks when using Pip and there is some VisualStudio C++ library dependency you need and they don't even have anymore. With Python it is best to get a large distribution that bundles everything already such as Anaconda...not worth the hassle.
IMO Python won primarily because its syntax was more English-like, which makes it a lot easier and more readable to regular people who never worked with shell and linux, and perl's $#@%>> and regExps confused the hell of them. That and the boom of PHP killed Perl 5 popularity, while python got its foot in the door in science/finance niches and survived.
Perl5 and perl6 are very different languages - however after perl5 there was nobody left to listen... (Disclaimer: I got used to perl5 and I still think it's quite reasonable)
If you are doing it for personal growth and learning, then Perl 6. It is an interesting language that cleanly supports many major programming paradigms including functional, imperative and object oriented and, most probably, others. Additionally, it has many nifty features built in such as concurrency, grammars and rationals. Aside from that, it is also fun to use and has a great community.
If you are doing it for immediate application in a job or wish to acquire a skill you think might be required in a job then the answer is probably Perl 5. Despite the noise about Perl, it is still used a lot in many industries for flow control and inter-tool format conversion as well as many other applications. Many, many people understand perl and use it for rapid development of automation tools. I can tell you for a fact that it is pervasive in the semiconductor industry and is practically a job requirement, there.
Perl6 is mostly complete. Even though it has been getting significant weekly performance improvements (they have a weekly summary of these things), it still has a long way to go. It has a lot of really cool things such as simple to use concurrency, grammars, gradual typing...etc. It is a big language, but not super hard to learn. A bunch of books have been released on it in the past two years. Perl5 is used by companies now. Some are choosing to write new software in it, but there is a lot of legacy apps.
Remember that they are two different languages with some similarities. Perl6 has really good support for making big apps I would say as it has really good OO support out of the box in addition to making FP concepts easy too. Perl5 has atrocious support baked in, but they have the excellent libraries that everyone uses to give them world class support for OO. Still, it feels more natural in Perl6.
On another note, Groovy is another great and fast language that runs on the JVM and works very well in the scripting space with a lot of DSLs for GUI, XML, DB...cool stuff.
> On another note, Groovy is another great and fast language that runs on the JVM and works very well in the scripting space
But the same versioning issue exists for Apache Groovy (2 or 3 ?) that exists for Perl (5 or 6 ?). The Groovy PMC seem to be handling the issue, though, by slowing down the development of Groovy 3 to a crawl so it won't ever ship.
Oh, and calling Groovy "fast" is a bit of a stretch. It hardly matters anyway because scripting for classes written in Java doesn't need "fast".
In many ways, p6 is the superior language. But as an AWK competitor specifically, p5 might be the better choice for performance reasons alone (while p6 has been improving, as far as regex performance goes, it just isn't there yet [1]). You might even be able to write tighter code in p5 (for one, p6 regexes return objects, not strings, so in cases where you actually want the latter, you'll have to throw in boilerplate).
Perl6 has grammars: https://docs.perl6.org/language/grammar_tutorial (recursive regex with sane syntax), which are much better than plain regex in many cases. Otherwise modern Perl5 is much more practical.
it always blows my mind that so few people really know how to use it at a decent level
I'll second this -- it's by far the most used swiss-army knife in my arsenal[0]. A little part of me dies when I see scripts doing grep|awk, using awk simply to print a column. After taking some time to actually read the gawk user manual a few years ago, I've found that I can do most things in awk that I was previously using grep/sed (and to a lesser extent) perl/python for in the shell. In my environment, it also comes with the benefit that I know the awk code I'm writing is supported by the version of gawk that's installed on every server/computer I come into contact with -- no having to ensure the right python/perl is available.
One thing I wish there was better documentation on, though, is creating complete gawk scripts. With @include directives coupled with AWK_PATH, it's really convenient to build up a handful of utility functions that do frequent things and I've found, on several occasions, that I end up writing an .awk script instead of a bash/zsh script and it ends up being a much more straight-forward set of code.
I think my tendency to reach for grep/sed/awk might be the thing that led me away from software development full-time and into the ops world.
Too many times I encountered 200+ line Python scripts that I could replace with an Awk one-liner and a cron job.
My lazy habit in my Ruby work eventually became to parse JSON/XML/whatever into usable text, shelling out to sed/awk and working with the results. This would not only save me LoC, but was less error-prone.
I think my hiccup is that it seems like good awk programmers are expected to be able to pass the full bloody program in as a simple string argument in a bash one-liner. Which is clearly not tenable.
There are a few things I have done which felt more difficult than they should. However, the vast majority of dealing with structured data using awk really has been a pleasant surprise.
Awk did have this reputation of hard-to-read one liners, but it was also always possible to write readable code in it much as you would do in any language.
My favorite bit of Awk code that I ever wrote was this print formatting program for the HP LaserJet II:
It printed source code in a "2-up" format, with two pages printed side by side in landscape mode. It looked for the kind of separators that I and many people were fond of back in those days (the source code is a good example) and converted those to graphic boxes. And it tried to be smart about things like breaking pages before a comment box, not after.
I wish I still have a sample printout handy. For 300 lines of what I thought was fairly cleanly written Awk code, it made some nicely formatted printouts.
I think some of the examples in popular documentation use awk -f to read a program from a file, so I don't feel like that's regarded as an unreasonable way to run awk code.
It's true that many programs might well just have a single action (or a single action plus a BEGIN and END), but I'm not sure that resorting to awk -f will give great offense.
I gotta say, I really enjoyed skimming through this, and saving it for future reference. I can already tell there are some basic things that I should've memorized with all the one liners I've used, but because most were cobbled together for the immediate task (and many online forum searches assume a lot of prior knowledge), so t this presentation, and cookbook style references really help me hold the practical benefits in memory alot longer, so thanks again!
if you are interested in text processing in general, do check the other chapters for grep/sed/perl/sort/paste/pr/etc
I learned a lot writing them up and marvel at the speed and functionality provided by these tools.. for many cases, it is lot easier to write a bash script combining these tools than to write a Perl/Python/Ruby script..
Ahh, what a useful link! Bookmarking immediately. I use AWK like once a year but when I do, I usually struggle a lot. Your examples are really helpful!
One thing I also really like about Awk is that you can keep 99% of the language in your head. With Awk it is seldom necessary to even reference the man page. I don't have to interrupt myself with looking up the syntax or the order of the parameters to a function.
Agreed. I learned awk a bit late in my Linux journey (started using Linux in 1999, learned awk in 2016), but I can't imagine not using it daily now. Great tool to manipulate data, and it's great for when you want a Unix tool that doesn't exist (e.g., computing the sum or mean of a stream of numbers).
awk is a fundamental part of the unix ecosystem. It's in the hall of fame of utils along with the likes of grep. If you use unix or linux in any semi-serious capacity, you are familiar with awk.
I use AWK daily; it’s my workhorse programming language and automation tool. Small, fast, no dependencies and portable.
Extremely powerful programming language for big data processing and data driven programming, I often use it to generate shell scripts based on some arbitrary data. With no memory management and providng hash arrays, AWK is an absolute delight to program in. Using it with the functional programming paradigm makes it even more powerful. It blows my mind that Aho, Weinberger and Kernighan managed to design something so versatile and yet so small and fast. I’ve also been using it to replace Python scripts with a ratio of Python:AWK being anywhere from 3:1 to 10:1 as far as lines of code needed to get the same tasks done.
I'll just share my biggest Awk-based project here: A tool that creates HTML documentation from Markdown comments in your source code.
It does essentially what Javadoc does, except it uses Markdown to format text, which I find much more pleasing to the eyes when you read source code.
The benefit of doing it in Awk is that if you want to use it in your project, you can just distribute a single script with your source code and add a two lines to your Makefile. Because of the ubiquity of Awk, you never have to worry whether people building your library has the correct tools installed.
It doesn't have all the features that more sophisticated tools like Doxygen provides, but I'm going to keep on doing the documentation for my small hobby projects this way.
It is true that there exists an awk most everywhere, but I've been burned a few times by the discovery that nawk (Solaris (dating myself a bit there...)), gawk (many GNU/Linux distros), and mawk (some GNU/Linux distros? Don't know where I ran into it, but I have in the last couple of years) all have subtle incompatibilities around the edges.
As I recall, gawk in particular has some extensions that are explicitly a superset of the standard funcionality. Which is great if you're only targeting gawk and know about it. It's less great if you think you're only targeting gawk, and discover later on that there's a system with a different awk you need to support.
Your project looks neat, by the way. I'm looking forward to taking a closer look.
Yes, there are differences. I used only standard awk facilities, and tested my project with gawk, mawk and nawk.
I've also been burned when I discovered that Raspbian ships with an older version of mawk that did something differently in the way it processed regexen that caused my script to break.
Very cool. But, are there environments which have awk that don't have perl5? Even most commercial unices back in the late 90s shipped some kind of perl.
I had this exact book and used awk on DOS/Novell in the early 90's when scripting choices were pretty scarce. The writing is tremendous - a model of clarity, and worth reading just for that. Anything with Kernighan, Pike or Plauger as author is worth checking out just for the example of clear thinking.
In 1996 I worked as a federal contractor for a US Army base. They had different Unix systems locked down for security reasons. Had Awk and Sed to work with and ordered the books from Amazon.
Oracle databases and other databases exported data in fixed width files and I had to download from several Nix systems to import into one general Nix system using Oracle and then a DOS based Clipper 5 system and an Access 2.0 Windows system and they all had to get the same results.
If not for Awk I could not filter the files from the Nix systems.
I printed this book and went through it and imho just skimming through all of it is worth it: just understanding how it works beyond the basic '{print $2}' is immensely worth it, and being exposed to some 'advanced' techniques gives you a set of techniques that you can reuse in your daily chores (in particular if you're a sysadmin).
This book is worth the read. Just to get in the mindset of the authors. I wish that more programming books could be as concise and useful at the same time.
> "Unlike Awk, the awk macro is a robust, self-contained language feature which can be used anywhere where a TXR Lisp expression is called for, cleanly nests with itself and can produce a return value when done. By contrast, a function in the Awk language, or an action body, cannot instantiate an local Awk processing machine. "
The manual contains a translation of all of the Awk examples from the POSIX standard:
The (-> name form ...) syntax above is scoped to the surrounding awk macro. Like in Awk, the redirection is identified by string. If multiple such expressions appear with the same name, they denote the same stream (within the lexical scope of the awk macro instance to which they belong). These are implicitly kept in a hash table. When the macro terminates (normally or via non-local jump like an exception), these streams are all closed.
For context, I'm in university, but during one of my internships, a lot of the older developers always seemed to use awk/sed in really powerful ways. At the same time, I noticed a lot of the younger developers hardly used it.
I'm not sure if it's a generational thing, but I thought that was interesting.
Anyways, are there any good resources to learn awk/sed effectively?
Awk really has been superseded by Perl (and therefore arguably by Python, Ruby, etc.) But sed remains a thing of beauty, all its own, and very well worth learning. Hardly a day goes by that I don't use it in some one-off command like
for i in *.png; do pngtopnm < $i | cjpeg > `echo $i | sed 's/png$/jpeg/'`; done
Over the years I've written too many awk one liners to count. Most of them look ugly - hell awk makes Perl look elegant - but having awk in your toolkit means that you don't have to drop out of the shell to extract some weird shit out of a text stream. Thanks Aho Weinberger and Kernigan!
And I'm still waiting for the structural regular expressions version of awk [0].
I very much like awk, I prefer it over sed, because it's easy to read. Also proper man page is all one needs. But I find myself many times doing something like this:
Awk is great for quick command line scripts and also for running on a very wide range of systems.
I recently wrote a simple statistics tool using Awk to calculate median, variance, deviation, etc. and people say the code is readable and good for seeing the simplicity of Awk.
In my perfect world mawk would have some of the gawk extensions, and it would have a csv reader mode to properly split csv into $1...$NF. Because that would be the killer tool.
That is a nice book. Starting with a practical tutorial and going into the structure and language features afterwards on a reasonable page count of just about 200 pages.
I like to use awk when I need something a little more powerful than grep. Nevertheless, when I look at the examples and where the book is heading I prefer R for many of the tasks (in particular Rscript with a shebang).
Just to give an example: If you have to manipulate a CSV file, that would most certainly be possible with awk, but some day there might be a record which does contain the separator and your program will produce some garbage. R on the other hand comes with sophisticated algorithms to handle CSV file correct.
I truly respect awk for what it was and is but I also think that the use-cases where it is the best tool for the job has become very narrow over time.
As I do most of my daily work in cheminformatics with a (shell-based) workflow engine (http://scipipe.org), awk has turned out to be the perfect way of defining even quite complicated components based on just a shell command. These days, pretty much 50% of my components are CSV/TDV data munging with awk! :D
I find awk so beautiful. I written many scripts in awk. It is so good at data transformation. I used it write a script to delete old and unused records from tables. The book is so beautifully written with amazing clarity of thought.
I recently had a sort of "contest" with someone for parsing the output of a tool. I had to parse some text output into a tree structure.
The other person wrote it in awk, quite quickly. After writing my own version in Python (my version was waaaay over-engineered), I decided to blatantly rip-off the awk solution and re-implement it in Python.
It was almost as simple and as short.
Awk is much more compact as a language, but also way more limited. And it still has its quirks and a certain volume of information you have to gather. I'd say it's more worthwhile to learn Python instead, because you'll be able to use it for other purposes.
> Because it's a description of the complete language, the material is detailed, so we recommend that you skim it, then come back as necessary to check up on details.
Any book that recommends skimming is doing something right.
Interesting, why not just replace the text then? Where can I find more info about this? I was actually trying to find a good source of info about PDF's recently and couldn't really find much.
Replacing the typeset text with any reasonable fidelity seems like a much harder problem than reproducing the scan and providing the ocr'ed text content. It might still be a good idea to do, maybe some software does this.
The lovely and humbling thing about this it was written 3 decades ago and the examples still work. Makes me think of another short elegant piece by Kenneth Church (?) called "Unix for Poets" which shows how to use core UNIX utils to work with text. Also from the mid to late 80s. Perl may have replaced sed and awk but they endure.
Lol. This is the IT equivalent of the hairy dog story about the itemized million dollar invoice: $1 - drill a hole, $999,999 - knowing where to drill the hole.
And by that I mean sometimes it seems easy to solve a problem because you have the skill to do so. It looks easy but that's only because of the time invested in making it easy for you. For anyone else the challenge remains.
I always found it interesting that the Awk paradigm is also the basis for IBM's RPG language. Two very different environments coming up with basically the same elegant solution for the same problem:
1. Run zero or more setup operations.
2. Loop over the lines of a text file and process its columns into an output format.
3. Run zero or more cleanup operations at the end.
This has bothered me for a long time. It looks like i'm seeing something other than what other people are seeing. So many of these documents just look awful. In this particular case it does not look good either:
Awk has it's uses. If you use the command line you'll probably use Awk occasionally.
I don't get the Perl hate. Perl's unpopularity may have something to do with some of the languages design choices. I think what really killed it was Perl coders. Some of the worst code I've seen happened to be written in Perl. If you follow clean code principles Perl is fine. Mojolicious is an awesome framework. I like it a lot.
Today I code Python and C. I used to code Ruby and before that Perl. I loved Ruby's syntax but Ruby seems to be waning. I'm looking forward to coding in Go. I'll be coding Javascript but I'm not looking forward to it.
Use the tool that fits the job. I have no loyalties to any programming language.
> Tom Radcliffe, recently presented a talk at YAPC North America titled “The Perl Paradox.” This concept refers to the fact that Perl has become virtually invisible in recent years while remaining one of the most important and critical modern programming languages. Little, if any, media attention has been paid to it, despite being ubiquitous in the backrooms of the enterprise.
> Yet at ActiveState, we have seen our Perl business continue to grow and thrive. Increasingly, our customers tell us that not only are they using more Perl but they’re doing more sophisticated things with it. Perl itself recently made it back into the Top 10 of the Tiobe rankings, and it remains one of the highest paying technologies. Therein lies the paradox.
With perl's use cases, I would be a bit surprised to see a job specifically for programming in perl. It's a tool that you use to support your other infrastructure. It's not a tool that you generally use to build up that infrastructure.
Similarly, I don't expect to see many jobs openings for wrenchers, but I fully expect a mechanic being hired someplace to be able to use a wrench.
> Awk has it's uses. If you use the command line you'll probably use Awk occasionally.
Agreed -- I use awk all the time in shell pipeline.
> I don't get the Perl hate.
The problem is not with writing Perl -- I don't mind using Perl to write scripts and tools. In fact, I like it better than Python for writing glue code.
Reading Perl code -- especially if the code base is old and has been worked upon by multiple people -- now that is a real pain. Don't get me wrong, I have seen some really well written Perl code and have had the good fortune of working with some really smart Perl programmers. But the majority of Perl code I have encountered has been unreadable mess that makes me want to pull my hair out. There are times when I feel it is more productive to rewrite the code in Python than to spend time on the existing code.
> If you follow clean code principles Perl is fine.
Agreed -- except most Perl coders don't. Worse, they make a large chunk of people who contributed to CPAN -- something which was one of the major reasons behind the popularity of Perl.
> Use the tool that fits the job. I have no loyalties to any programming language.
And yet most Perl code was never hard to read and understand, except for stuff like OOP, but OOP is hard in any language. In fact push for OOP is probably one of the notable things that contributed to Perl's decline.
Correct, because good luck trying to figure out what the hell someone else's Perl program is doing later.
I used to spend up to 40 hours in a Perl debugger trying to figure out what the program is doing, and after going 25 layers deep into the call stack, I'd come out one week later none the wiser as to what that damn spaghetti code was doing. Tracking down any bug would always turn into debugging of epic proportions. I never had such problems debugging machine code!
That is why Perl gets so much hate.
My favorite construct to hate was going through the logic and just as I was about finished trying to understand the state machine at that point hitting an unless{} clause which instantaneously wipes the slate clean. Oh how I hate Perl.
> Some of the worst code I've seen happened to be written in Perl
Second that. Worked with a guy who used (and adored) Perl for more than 15 years. His C++ code was so incomprehensible I had to rewrite many things in my spare time.
I would say hate is a strong word but I have 2 issues with Perl:
1) Regex choices made
2) Readability, if someone wrote something in Perl I normally would rewrite it if it took me less time then figuring out what they wrote and wasn't working as expected. Maybe it was just my poor skills but man Perl can be hard to tell what is actually going on.
It's like shells or editors. Many `older` beards are mindful of the aspect that not all tools are equal in that most systems had some tools shipped as default and others you had to go and install yourself. Thus they became wise though experience of sticking to using tools that did the job and would be more commonly available. Hence they all grew up with the likes of vi,sh and awk. They did the job and worked well. Yes alternatives became more acesable and now most systems ship with all the flavour of shell and editor as they class the likes of bash,ksh,csh... and emac/perl to be part of those essential common installs. This and space/storage less of a factor as it was decades ago.
Really gets down to personal taste now more than it did in the past, but if a tool works, does the job and the alternatives don't offer any gains, then they stick with that. Not saying the alternatives are bad or worse in any way, or indeed better, they are just different and in some cases, maybe better. At least the core reason of them always being there instead of having to add another dependency and risk factor to a system have become moot these days.
TL;DR some older tools more guaranteed to be upon all systems as a lowest common denominator in the older days than now and legacy always outlives the machine, hence we still run COBOL today as it works, maybe better solutions but rebuilding rome overnight still avoided.
I don’t think it’s just a mythical old greybeard thing. I’m 28 and I use vim for exactly this reason: a vi-like editor, not necessarily vim, is guaranteed to be available everywhere. I don’t have any particular reason to like vim over emacs, other than the fact that there is value in using standard tools.
My favorite OS, FreeBSD, comes with neither bash, nor emacs, nor Perl, so I don’t consider them part of the “lowest common denominator” set even in 2018.
(Funnily enough, it does come with a C++ compiler!)
I know many people will downvote, but in my opinion, just say no to this ancient "programming language". It's so confusing, completely text based, designed ages ago in an entirely different environment. There are many better alternatives, like Python or Powershell. Why not use them?