YES! These are pretty much exactly the methods I used when I developed my projec...

andrepd · on Dec 19, 2020

I don't mean to be overly negative but browsing through some titles the "emotional story arc" is indistinguishable from a randomly generated line graph. Clicking on the bars reveals the way this was obtained... "bad, death, dark, danger" = less score, "good, great, love" = good. Of course such a trivial and simplistic analysis cannot ever produce any meaningful result.

The "most passive page" thing also does not seem to be working. Passive as in passive voice? If yes it's also pretty off the mark.

benjismith · on Dec 19, 2020

I respect your skepticism :)

It's easy to imagine exceptions to the idea of a simple numerical word-scoring algorithm...

Of course, a word like "bad" might be used ironically, or in some other slang-sense, with a different literal meaning on the page...

But that's totally fine. In principle, the word2vec algorithm is designed to cope with ambiguities like that.

When you analyze billions of words of prose, you can build a model of word-associativity that captures the superposition of all those different word-senses, and the contexts where they tend to appear on the page.

After a big crazy machine-learning process, each word is modeled as a vector in 300-dimensional space, with a vast network of associations and relationships between the other words in the vector-space, based on the way those words are used together in typical English grammar.

When we score the emotional valence of a particular word, we use a "word-vector" technique where those ambiguities are basically already priced into the scoring calculation. Words with a "less ambiguous" sentiment score (joy, paradise, ..., agony, depression) have their lack-of-ambiguity baked into the formula already.

Extreme scores are reserved for words with unambiguous intensity.

But the important thing is: we're not really as concerned about the numerical scores of individual words as we are with the shifting balance of those sentiment scores over the course of a long document.

It's not a perfect way of scoring sentiment of individual words, but it's REALLY reliable for estimating the basic structure of a narrative.

danlangdev · on Dec 18, 2020

Wow! I've been a longtime lurker on HN, and I created an account to just tell you that prosecraft.io is beautifully designed! Would you mind sharing what visualization tools or libraries you used to render your graphs?

benjismith · on Dec 18, 2020

Awww thank you! I really appreciate it!

I'm not using any visualization libraries. It's all just hand-coded javascript... I've been meaning to learn D3 for a long time, but I haven't gotten around to it yet.

benjismith · on Dec 18, 2020

Oops, I almost forgot... The one viz component I'm using is the excellent WordCloud2 library by Timothy Guan-tin Chien...

https://timdream.org/portfolio/wordcloud/

danlangdev · on Dec 18, 2020

Even more impressive! I immediately assumed D3 but wasn't entirely sure. Congrats again on this work corroborating Vonnegut's 'shapes' of stories.

NoRagrets · on Dec 19, 2020

Second this. Love the prose craft site! Beautiful!

jmd509 · on Dec 18, 2020

Your site is very impressive!

Of the books you've analyzed, interesting - but not necessarily surprising - to see a Palahniuk book has the least "passive voice" usage (1).

1.) http://prosecraft.io/library/chuck-palahniuk/pygmy/

paloaltokid · on Dec 18, 2020

Nice site, it's fun to look around!

I threw a curveball at it: http://prosecraft.io/library/mark-z-danielewski/house-of-lea...

It would be interesting to see if Prosecraft would ever correlate "similar books" with Borges since Danielewski said that was an influence.

benjismith · on Dec 19, 2020

Right now the "similar books" thing is based on a "topic-model"...

So books are more likely to be similar if they're roughly in the same genre and discuss similar kinds of topics (dragons, computers, romance, spies, war, shopping, time-travel, magic, hunting, etc).

Someday I hope the "similar books" feature will be a bit more sophisticated, where other kinds of "similarity" will also be relevant, beyond just the topic-model... Other things like: story structure, narrative voice, irony, vocabulary, sense-of-humor, lyricisim, etc...

totemandtoken · on Dec 18, 2020

This is a beautiful site done in such an original way. I just tried it on my favorite Vonnegut book: http://prosecraft.io/library/kurt-vonnegut/mother-night/