Getting Unique Counts From a Log File

joshbaptiste · on June 23, 2013

It amazes me at work how many people just violate UUOC/UUOG

ie..

  cat file | grep bar | awk '{ print $1 }' | sort | uniq

which can simply be

  awk '/bar/ { print $1 | "sort -u"}' file

casca · on June 23, 2013

I've done this for single-use pipelines when built up because the nature of the data is unfamiliar. It usually starts with

   cat file | less

and then

   cat file | grep bar | less

and then

   cat file | grep bar | awk '{ print $1 }' | less

and so on.

But if it's going to be used more than once then it does seem odd to have the unnecessary commands. Maybe "advanced" awk (beyond print $1) is unknown to people?

ibotty · on June 23, 2013

if you are only doing 'print $1' you should be using cut. but whatever.

gpapilion · on June 23, 2013

Its a balance of clarity, vs efficiency. Also, I chose uniq over sort -u, for the -c option.

That being said, point well made.

e12e · on June 23, 2013

Well, I often do:

    cat some.log | awk '/regexp/ { print $2}' | sort| uniq

Then tweak that line to get what I want, then change it to:

    zcat some.log*| ...

jonjenk · on June 23, 2013

There is a great wiki book on this subject.

http://en.wikibooks.org/wiki/Ad_Hoc_Data_Analysis_From_The_U...

pessimizer · on June 23, 2013

Hopefully you've already submitted this, because I'm about to. Looks great.

steffan · on June 23, 2013

I believe the word you want is 'Glean'