It took me two decades to finally decide to memorize "... | awk '{print $1}'" as...

penguin_booze · 2025-08-05T07:03:47 1754377427

I'd like to give a shoutout to jq for (non-JSON) text processing, and also as an almost-replacement for awk:

  echo "foo bar" | jq -Rr 'split(" ")[0]' # prints foo

I say 'almost', because jq right now can't invoke sub processes from within, unlike awk and bash. But otherwise, it's a fully fledged, functional, language.

wwader · 2025-08-05T09:29:07 1754386147

Same! and jq's regexp functions are quite powerful for transforming text into something more structured:

  $ echo -e "10:20: hello\n20:39: world" | jq -cR 'capture("^(?<timestamp>.*): (?<message>.*)$")'
  {"timestamp":"10:20","message":"hello"}
  {"timestamp":"20:39","message":"world"}


  $ echo -e "10:20: hello\n20:39: world" | jq -Rn '[inputs | capture("(?<t>.*): (?<m>.*)$").m] | join(" ")'
  "hello world"

Also using inputs/0 etc in combination with reduce and foreach it's possible to process streams of text that might not end.

Eduard · 2025-08-05T10:17:02 1754389022

`jq -Rr 'split(" ")[0]'`? That would easily take me three decades to memorize ;-).

for selecting the second column, PID, from `ps aux`,

    ps aux | awk '{print $2}'

works for me, as gawk's default $FS field separator treats runs of whitespaces as a single separation of fields ( https://www.gnu.org/software/gawk/manual/html_node/Field-Spl... ), quite similar to "standard" Open Group/POSIX awk ( https://pubs.opengroup.org/onlinepubs/9799919799/utilities/a... ).

    ps aux | jq -Rr 'split(" ")[1]'

on the other hand doesn't work here due to `ps aux` adding a variable number of spaces for formatting.

    ps aux | jq -Rr 'split("\\s+"; null)[1]'

seems to work though.

penguin_booze · 2025-08-05T11:23:19 1754392999

Yes, in awk, conveniently, whitespace means a run of whitespace. It gets a bit verbose in jq for the same effect; but on the other hand, you get slicing - more slices than `cut` can give you.

fainpul · 2025-08-05T14:47:32 1754405252

These comments here about more or less clever text processing tools, each with their own syntax, feel like archaic hacks. Using something like Nushell or PowerShell makes this trivial.

E.g. for the `ps aux` example, using PowerShell, selecting a certain column:

  gps | select id

The output of gps is an array of objects. Selecting a property does this for all elements in the array.

xorcist · 2025-08-05T17:12:04 1754413924

People do this, but it feels a bit weird to start a whole new language interpreter which is much more powerful than the pitiful shell language we use, just to split a field. Why not write your whole program in awk, then? It's likely more efficient anyway.

The canonical shell tool to split fields is cut. Easy to use, simple to read.

For the very common use case to set variables to fields, use the shell built-in read. It uses the same IFS as the rest of the shell.

cluckindan · 2025-08-05T05:54:00 1754373240

Why not `… | cut -f1 -d ” ”` though?

efrecon · 2025-08-05T06:27:35 1754375255

Because awk understands "columns" better.

echo '999 888' | awk '{print $2}'

prints 888

cluckindan · 2025-08-05T10:10:39 1754388639

How is that different from

    echo '999 888' | cut -f2 -d " "

except for the default delimiter being space for awk?

Eduard · 2025-08-05T10:21:25 1754389285

   echo '999  888' | cut -f2 -d " " # notice two consecutive spaces

returns NULL.

xorcist · 2025-08-05T17:07:49 1754413669

There are invalid column separators for awk too. To handle consecutive spaces in cut you have squeeze them:

  echo '999  888' | tr -s " " | cut -f2 -d " "

cluckindan · 2025-08-05T10:27:19 1754389639

Right, that would be expected though? I suppose awk is better for parsing formatted (padded) output.

bregma · 2025-08-05T10:24:20 1754389460

How can it be "better" when it is exactly the same?

stevekemp · 2025-08-05T06:22:55 1754374975

Speaking personally I find that I always use awk for printing (numbered) fields in shell-scripts, and when typing quick ad-hoc commands at the prompt.

The main reason for this is that it's easy to add a grep-like step there, without invoking another tool. So I can add a range-test, a literal-test, or something more complex than I could do with grep.

In short I can go from:

       awk '{ print $2 }'

To this, easily:

       awk '$1 == "en4:" {print $NF}'

cluckindan · 2025-08-05T13:39:08 1754401148

Yet using cut, even with extra piping to tr or grep before, is probably over two times faster than awk, which may or may not matter depending on what you’re doing.

apples_oranges · 2025-08-05T08:44:30 1754383470

Now imagine what all the LLM copy pasting could do to our skills ..