Beer and Data Science – Looking at topic modeling in multi-aspect reviews

bcaine · on April 16, 2015

Nice read. I did something sort of similar with the same dataset about a year ago. I compared LDA (Latent Dirichlet Allocation) to TF-IDF as tools to find similar beers based on their review text. Lots of intuitive and funny topics discovered.

I suggest you play with LDA, it seemed to work really well at generating topics. There is also a lot of fascinating, very readable research using it. Check out SNAPs work on the same dataset [1] and some of the Yelp Dataset challenge winners [2]. If you end up interested in doing so, Gensim [3] was pleasant enough to work with.

[1] http://snap.stanford.edu/data/web-BeerAdvocate.html

[2] http://www.yelp.com/dataset_challenge

[3] https://radimrehurek.com/gensim/wiki.html#latent-dirichlet-a...

gjreda · on April 16, 2015

Great post! I've been thinking about writing something similar with that same BeerAdvocate data. Good job beating me to it :)

Instead, I ended up writing a satirical beer snob bot [1] which tweets nonsensical beer reviews using Markov Chains. Some are bad, but some are pure gold. You can read about it here [2]. The code's also on GitHub [3].

[1] https://twitter.com/BeerSnobSays

[2] http://www.gregreda.com/2015/03/30/beer-review-markov-chains...

[3] https://github.com/gjreda/beer-snob-says

bcohen123 · on April 16, 2015

Cool stuff, followed! Feel free to steal any parts of my work you think may improve it. May be cool to be able to control the polarity of the review you're tweeting.

socceroos · on April 16, 2015

I just came across a relevant site this morning. Hilarious hipster brew review satire: http://vicioustasting.com/

bcohen123 · on April 16, 2015

Wow, that's pretty well done. Any indication how it's made? Looks too good to be an HMM.

JasonCEC · on April 16, 2015

For anyone interested in beer and data science, my startup[1] uses machine learning and artificial intelligence to build flavor profiling and quality control tools for craft beverage producers.

Our models flag and predict flaws, taints, contaminations, and batch-to-batch deviations in real time from human sensory data. We then leverage our clients quality control data for flavor profile optimization, demographic targeting, and cognitive marketing - helping them sell consistently better products to their most valuable consumers.

[1] www.Gastrograph.com

bcohen123 · on April 16, 2015

Cool stuff! And nice last name!

archimedespi · on April 16, 2015

Love the license =D

igravious · on April 16, 2015

heh, logged in to post it :)

-- spoiler alert --

[...]

If the Author of the Software (the "Author") needs a place to crash and you have a sofa available, you should maybe give the Author a break and let him sleep on your couch.

If you are caught in a dire situation wherein you only have enough time to save one person out of a group, and the Author is a member of that group, you must save the Author.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO BLAH BLAH BLAH ISN'T IT FUNNY HOW UPPER-CASE MAKES IT SOUND LIKE THE LICENSE IS ANGRY AND SHOUTING AT YOU.

--

cobranet · on April 16, 2015

How to get data ?

bcohen123 · on April 16, 2015

I grabbed the data from here awile back: https://snap.stanford.edu/data/web-BeerAdvocate.html Unfortunately it appears to no longer be available for download. This (https://snap.stanford.edu/data/web-RateBeer.html) seems to be a similar dataset which may be available for use.