Hacker News new | past | comments | ask | show | jobs | submit login
Beer and Data Science – Looking at topic modeling in multi-aspect reviews (ipython.org)
64 points by bcohen123 on April 16, 2015 | hide | past | favorite | 11 comments



Nice read. I did something sort of similar with the same dataset about a year ago. I compared LDA (Latent Dirichlet Allocation) to TF-IDF as tools to find similar beers based on their review text. Lots of intuitive and funny topics discovered.

I suggest you play with LDA, it seemed to work really well at generating topics. There is also a lot of fascinating, very readable research using it. Check out SNAPs work on the same dataset [1] and some of the Yelp Dataset challenge winners [2]. If you end up interested in doing so, Gensim [3] was pleasant enough to work with.

[1] http://snap.stanford.edu/data/web-BeerAdvocate.html

[2] http://www.yelp.com/dataset_challenge

[3] https://radimrehurek.com/gensim/wiki.html#latent-dirichlet-a...


Great post! I've been thinking about writing something similar with that same BeerAdvocate data. Good job beating me to it :)

Instead, I ended up writing a satirical beer snob bot [1] which tweets nonsensical beer reviews using Markov Chains. Some are bad, but some are pure gold. You can read about it here [2]. The code's also on GitHub [3].

[1] https://twitter.com/BeerSnobSays

[2] http://www.gregreda.com/2015/03/30/beer-review-markov-chains...

[3] https://github.com/gjreda/beer-snob-says


Cool stuff, followed! Feel free to steal any parts of my work you think may improve it. May be cool to be able to control the polarity of the review you're tweeting.


I just came across a relevant site this morning. Hilarious hipster brew review satire: http://vicioustasting.com/


Wow, that's pretty well done. Any indication how it's made? Looks too good to be an HMM.


For anyone interested in beer and data science, my startup[1] uses machine learning and artificial intelligence to build flavor profiling and quality control tools for craft beverage producers.

Our models flag and predict flaws, taints, contaminations, and batch-to-batch deviations in real time from human sensory data. We then leverage our clients quality control data for flavor profile optimization, demographic targeting, and cognitive marketing - helping them sell consistently better products to their most valuable consumers.

[1] www.Gastrograph.com


Cool stuff! And nice last name!


Love the license =D


heh, logged in to post it :)

-- spoiler alert --

[...]

If the Author of the Software (the "Author") needs a place to crash and you have a sofa available, you should maybe give the Author a break and let him sleep on your couch.

If you are caught in a dire situation wherein you only have enough time to save one person out of a group, and the Author is a member of that group, you must save the Author.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO BLAH BLAH BLAH ISN'T IT FUNNY HOW UPPER-CASE MAKES IT SOUND LIKE THE LICENSE IS ANGRY AND SHOUTING AT YOU.

--


How to get data ?


I grabbed the data from here awile back: https://snap.stanford.edu/data/web-BeerAdvocate.html Unfortunately it appears to no longer be available for download. This (https://snap.stanford.edu/data/web-RateBeer.html) seems to be a similar dataset which may be available for use.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: