Ah, this brings back memories. I tried something like that about 12 years ago, with a homebrewn half-hardcoded, half-neural thing and a large set of Jules Verne books.
The giddy excitement of being on the verge of unlocking the power of language slowly turned into confusion and then decayed into disappointment as i slowly realized that the task was maybe a bucket of orders of magnitude harder than i thought. I think it produced some coherent phrases every now and then, but mostly it was a random words generator.
...I miss these amplified feelings of cluelessly diving into an impossible project...
> The giddy excitement of being on the verge of unlocking the power of language slowly turned into confusion and then decayed into disappointment as i slowly realized that the task was maybe a bucket of orders of magnitude harder than i thought
This was very wonderfully articulated! Do you have a blog or something?
Well, Orbital Designs http://orbides.org certainly qualifies as "something", but i'm not sure how much of a "blog" it really is since all of the content is indexed. :)
Strangely enough, I find the output of Markov Chain - type text generators much more readable and coherent than the output of those RNNs.
A Marov Chain text generator is also easy to implement and understand.
The interesting properties of RNNs aren't in readability. RNNs can learn syntactical states, e.g. "this text is in quotes" that enable some really cool analysis tricks. OpenAI published an article about this recently, where they showed that an RNN, without being trained on it, learned the difference between positive and negative reviews. The fact that an RNN encodes sentiment is an incredibly useful feature.
If I were going to guess (not the parent), it's because human parsing is quite localised, and Markov models tend to optimise well for localised coherence.
Their failure mode is long sentences which don't go anywhere (or mean anything), but that's less grating than a blatant parse failure, especially if you're skim reading.
Claude Shannon[1], information theory guy, worked on machine generated music quite a lot. Some of the ideas transfer. Information theory informs a lot in this area[2].
I came up with similar results by just tracking word probabilities and then regenerating text based on those probabilities.
Feeding it Edgar Allen Poe produced the best results:
twelve vibrations of purple air. vortex of its absolute hurricane.
monck flying footsteps of the picture all the police are concerned.
silence on a year in an identity.
materiality escaping the city was their white spot of printing apparatus all things only one supposes it were its mere duration that it in baltimore monument to its luxuriant nature asserts her person whom i not hesitate about five hundred paces to have not the principia of idiotic thought of curvature or metamorphosis these friends were in an earthquake and the then i felt my chair so rigidly of interest could not summon courage
educationists made around in our modern medicine.
gq did make perfect mankind which the blue pill every week had infinite instant messaging that sucks when your food day that previous background is about being foreign interventions. psychopharmaceutical arsenal.
attached to tail but a distribution units in our traditional estimates that men engineers who negotiate my whole province of long awkward to call it came back go deep conversations you grow fast when there been around the long lever systems.
Haha yeah it's a little disappointing (yet interesting) how it comes close but misses the mark. I wonder how far away we are to having machine-generated output that's tough to distinguish from the input you put into it.
The Obama one is much better, but the thing is none of these are even remotely close. They sort of seem close on the surface if you simply don't engage any critical parts of your brain, or try to extract any meaning, but if you do the semblance of meaning that's inherent in the words melts away into nothing.
To generate output that contained meaning that you could actually engage with, you'd have to first generate the meaning and then turn that into text. That's a fundamentally different process to what is going on here. This is also why 'Turing Test' competitions involving chat bots are never going to give us artificial intelligence.
This sort of text generation is like a magician's trick where he appears to saw the woman in half and put her together again. I don't care how good the magician is at the trick, he's not going to be any help at all to a surgeon that's actually cutting someone in half and then stitching them back together again.
Hahaha! Yeah, maybe that's part of what makes the output distinctive. Do you have any good recommendations for a person looking to improve their shoddy writing?
The title is from the perspective of the submitter (being the husband in the original title), it should be "visakanv-RNN — machine-generated husband chatter". Which is a lot less catchy.
The giddy excitement of being on the verge of unlocking the power of language slowly turned into confusion and then decayed into disappointment as i slowly realized that the task was maybe a bucket of orders of magnitude harder than i thought. I think it produced some coherent phrases every now and then, but mostly it was a random words generator.
...I miss these amplified feelings of cluelessly diving into an impossible project...