Hacker News new | past | comments | ask | show | jobs | submit login
Getting Unique Counts From a Log File (hypergeometric.com)
9 points by gpapilion on June 22, 2013 | hide | past | favorite | 8 comments



It amazes me at work how many people just violate UUOC/UUOG

ie..

  cat file | grep bar | awk '{ print $1 }' | sort | uniq
which can simply be

  awk '/bar/ { print $1 | "sort -u"}' file


I've done this for single-use pipelines when built up because the nature of the data is unfamiliar. It usually starts with

   cat file | less
and then

   cat file | grep bar | less
and then

   cat file | grep bar | awk '{ print $1 }' | less
and so on.

But if it's going to be used more than once then it does seem odd to have the unnecessary commands. Maybe "advanced" awk (beyond print $1) is unknown to people?


if you are only doing 'print $1' you should be using cut. but whatever.


Its a balance of clarity, vs efficiency. Also, I chose uniq over sort -u, for the -c option.

That being said, point well made.


Well, I often do:

    cat some.log | awk '/regexp/ { print $2}' | sort| uniq
Then tweak that line to get what I want, then change it to:

    zcat some.log*| ...


There is a great wiki book on this subject.

http://en.wikibooks.org/wiki/Ad_Hoc_Data_Analysis_From_The_U...


Hopefully you've already submitted this, because I'm about to. Looks great.


I believe the word you want is 'Glean'




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: