I am happy to hear that you had a good first impression. At Netflix, we do some Linux scheduler instrumentation with eBPF and overhead matters. I was inspired to create the tool to enable the traditional performance work loop: get a baseline, tweak code, get another reading, rinse & repeat.
Apropos of nothing, Ellen Pao commented on how Reddit handles user data:
> In 2014, we decided not to sell reddit user data because there is no way to monitor and verify use. There is no way to ensure data is deleted/corrected. It may be on any engineer's laptop. Who had access to copy it? How do you correct or delete PII or unauth porn after it's sold?
Note that this is essentially the same as facebook’s policy: they did not sell the user data as part of their advertisement business. It was an app that siphoned up user data without Facebook being paid.
A lot of reddit data is actually far more accessible via the API or the public data dump, which is on BigQuery. What makes the data less useful is the comparative anonymity because reddit users tend to use pseudonyms, and the fact that you can’t target users after building a targeting model from scraped data.
You could still trawl the reddit data to find useful correlations, and maybe get a good deal on ads if you notice the overlap of /r/the_donald with /r/bedwetting or whatever.
As someone who's done exactly that with the Reddit data (http://minimaxir.com/2016/06/reddit-related-subreddits/), posts/comments aren't as useful for intent as you note. The valuable Reddit data is the nonpublic views/subscription/upvote/downvote behavior.
This is a pretty good idea. Something that scans tweets and other publicly accessible social media for a list of known words and then compiles it into a report for potential employers would go gangbusters in the HR/hiring areas. It obviously couldn't get everything, but would be a big help for employers.
Sounds wonderful. /s An automated tool that disqualifies people from employment for something ill-advised they once wrote in college based on a naive algorithm.
ADDED: Of course, if you want to be an authentic disruptive Silicon Valley startup you'd offer both this service and a service for end users to expunge any online content that would be flagged by their employer service. Should be good for some fawning tech press writeups.
I'm sure people that don't get a job because of something they said in the past wouldn't like it. I don't really like the idea of not hiring someone based on behavior far in the past (recent public behavior on social media is fair game in my mind).
However, in terms of a startup product that companies would pay for, I stand by my claim that it's a great idea. After all, HN is ran by yCombinator and startups are kind if its thing.