Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It took me two decades to finally decide to memorize "... | awk '{print $1}'" as a command pipeline idiom for filtering stdout for the first column (... "awk '{print $2}'" for the second column, and so on).

All it required from me was to intentionally and manually type it down by hand instead of copy-pasting, on two purposeful occasions (within two minutes). Since then it's successfully saved in my brain.



I'd like to give a shoutout to jq for (non-JSON) text processing, and also as an almost-replacement for awk:

  echo "foo bar" | jq -Rr 'split(" ")[0]' # prints foo
I say 'almost', because jq right now can't invoke sub processes from within, unlike awk and bash. But otherwise, it's a fully fledged, functional, language.


Same! and jq's regexp functions are quite powerful for transforming text into something more structured:

  $ echo -e "10:20: hello\n20:39: world" | jq -cR 'capture("^(?<timestamp>.*): (?<message>.*)$")'
  {"timestamp":"10:20","message":"hello"}
  {"timestamp":"20:39","message":"world"}


  $ echo -e "10:20: hello\n20:39: world" | jq -Rn '[inputs | capture("(?<t>.*): (?<m>.*)$").m] | join(" ")'
  "hello world"
Also using inputs/0 etc in combination with reduce and foreach it's possible to process streams of text that might not end.


`jq -Rr 'split(" ")[0]'`? That would easily take me three decades to memorize ;-).

for selecting the second column, PID, from `ps aux`,

    ps aux | awk '{print $2}'
works for me, as gawk's default $FS field separator treats runs of whitespaces as a single separation of fields ( https://www.gnu.org/software/gawk/manual/html_node/Field-Spl... ), quite similar to "standard" Open Group/POSIX awk ( https://pubs.opengroup.org/onlinepubs/9799919799/utilities/a... ).

    ps aux | jq -Rr 'split(" ")[1]'
on the other hand doesn't work here due to `ps aux` adding a variable number of spaces for formatting.

    ps aux | jq -Rr 'split("\\s+"; null)[1]'
seems to work though.


Yes, in awk, conveniently, whitespace means a run of whitespace. It gets a bit verbose in jq for the same effect; but on the other hand, you get slicing - more slices than `cut` can give you.


These comments here about more or less clever text processing tools, each with their own syntax, feel like archaic hacks. Using something like Nushell or PowerShell makes this trivial.

E.g. for the `ps aux` example, using PowerShell, selecting a certain column:

  gps | select id
The output of gps is an array of objects. Selecting a property does this for all elements in the array.


People do this, but it feels a bit weird to start a whole new language interpreter which is much more powerful than the pitiful shell language we use, just to split a field. Why not write your whole program in awk, then? It's likely more efficient anyway.

The canonical shell tool to split fields is cut. Easy to use, simple to read.

For the very common use case to set variables to fields, use the shell built-in read. It uses the same IFS as the rest of the shell.


Why not `… | cut -f1 -d ” ”` though?


Because awk understands "columns" better.

echo '999 888' | awk '{print $2}'

prints 888


How is that different from

    echo '999 888' | cut -f2 -d " "
except for the default delimiter being space for awk?


   echo '999  888' | cut -f2 -d " " # notice two consecutive spaces
returns NULL.


There are invalid column separators for awk too. To handle consecutive spaces in cut you have squeeze them:

  echo '999  888' | tr -s " " | cut -f2 -d " "


Right, that would be expected though? I suppose awk is better for parsing formatted (padded) output.


How can it be "better" when it is exactly the same?


Speaking personally I find that I always use awk for printing (numbered) fields in shell-scripts, and when typing quick ad-hoc commands at the prompt.

The main reason for this is that it's easy to add a grep-like step there, without invoking another tool. So I can add a range-test, a literal-test, or something more complex than I could do with grep.

In short I can go from:

       awk '{ print $2 }'
To this, easily:

       awk '$1 == "en4:" {print $NF}'


Yet using cut, even with extra piping to tr or grep before, is probably over two times faster than awk, which may or may not matter depending on what you’re doing.


Now imagine what all the LLM copy pasting could do to our skills ..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: