Hacker News new | past | comments | ask | show | jobs | submit login

I started playing with Google's BigQuery today, and it has StackOverflow and HN and GitHub datasets that make pulling data like this relatively trivial. I've been super impressed with it. Examples here: https://cloud.google.com/blog/big-data/2016/12/google-bigque... and some more at https://cloud.google.com/bigquery/public-data/stackoverflow

I just quickly hacked together this query which pulls out all amazon URLs in post answers:

    SELECT REGEXP_EXTRACT(body, r'[^a-z](http[a-z\:\-\_0-9\/\.]+amazon[a-z\:\-\_0-9\/\.]*)[^a-z]') AS link, COUNT(1)
    FROM [bigquery-public-data:stackoverflow.posts_answers] 
    GROUP BY 1 ORDER BY 2 DESC LIMIT 20
It takes 5 seconds to run - over ALL stackoverflow answers!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: