Show HN: Cyborg Writer - In-Browser Text Editor with Neural Autocomplete

antimatter15 · on Oct 25, 2017

Hey HN!

This is our latest demo for TensorFire (https://tenso.rs/), a Javascript library for running neural networks in the browser GPU-accelerated by WebGL.

Each of these models is a simple two-layer char-rnn (the kind explored in Karpathy's "Unreasonable Effectiveness of RNNs" post http://karpathy.github.io/2015/05/21/rnn-effectiveness/). When you hit "Tab" the editor feeds in the contents of your document letter-by-letter into a network which is trying to predict the most likely next letter. It samples from that probability distribution and feeds it back into the network to generate an arbitrarily long string.

We've published code and instructions for training new models on Github (https://github.com/tensorfire/cyborg-writer/tree/master/trai...). With about a megabyte of any sort of text, you can train a model and easily embed it into any web application.

We (@bijection and @antimatter15) will be answering questions in the comments.

projectant · on Oct 26, 2017

No offence, but with a megabyte of text per speaker I feel I could do better with word n-gram Markov models (even bigrams), and random path choice.

I think deep learning is great and all ( and I'm meaning to learn it ) but shouldn't it be able to do far better than Markov models, or other simple things?

Image captioning? Incredible. Deep Mind winning video games? Incredible. Style transfer? Incredible. With one exception ( I saw on HN ages ago, sorry I have no link -- it basically generates novel text using deep learning, across all sorts of genres, such as "academic paper", "math paper", "novel", "film script" and I found the results remarkable and interesting ) I question if many text applications are doing better than Markov.

I think the issue is there is something fundamental and sophisticated about human language which our current deep learning models, with all their omniscient benevolence ( or whatever ), are missing. There's something deep about the structure of language that we are not modelling yet in deep learning as far as I've seen. When we do .... boom ... computers that learn from the internet and amaze us all. Then we'll have something to shine, smile about or fear.

Sorry for the digression and what may be inapplicable comparisons. I can get impassioned about this topic.

YeGoblynQueenne · on Oct 26, 2017

>> I think the issue is there is something fundamental and sophisticated about human language which our current deep learning models, with all their omniscient benevolence ( or whatever ), are missing. There's something deep about the structure of language that we are not modelling yet in deep learning as far as I've seen.

I think the secret sauce that's missing from deep learning -as well as any other kind of statistical language model- is a representation of the context outside language itself.

What I mean is, when we humans [1] communicate using language, the language we generate (and recognise) does not carry all of the information that we want to convey. A lot of the meaning in our utterances ...is not in our utterances.

We haven't really found any way to represent this (dare I say) deep context yet. In genearl, in NLP, even the word "context" means the context of a token, in other words the tokens around it. Even mighty word vectors work that way.

The problem is of course that its very hard to even find data to train on, if you want to model that context with some machine learning algorithm. How do you represent everything that a person might know about the world, when they speak or write something?

But- without that deep context, any utterance is just random noise, even if it's structurally correct. So we're left with a bunch of techniques that are damn good at modelling structure, but with meaning, we fail.

___________

[1] We are all humans here, right? Just in case- I love AI! Go robots!

projectant · on Oct 27, 2017

i think this, learning of hierarchies, could be the start of something relevant to unlocking more workable language modelling

https://blog.openai.com/learning-a-hierarchy/

rljy · on Oct 27, 2017

I certainly agree. After-all, each rule for creating a pair in a syntax tree could be learned and these rules could be put together [1]

[1]https://www.google.cz/search?q=sentence+structure+chomsky&so...

YeGoblynQueenne · on Oct 26, 2017

I also agree that it's hard to see the benefit of using deep learning (which implies gigantic amounts of data and processing power, therefore costs) over traditional models like n- or skip-grams and markov chains. At best, you'll be paying a big overhead, in working hours and expertise no less, for just a modest improvement in, say, perplexity.

And since the use is in generating natural language, the really important evaluation is extrinsic (how well received your prodcut is by your user base, how well it integrates with what you have already etc). In that sense, it's even harder to see the benefit of deep learning over simpler, faster, cheaper models. Is your user base really going to notice an improvement of 10% over whatever intrinsic score you get from deep learning?

Anycase, I made a couple of short experiments with the application. Here's the results:

Science fiction (random phrases):

  I was flattered and free, my bracket series of precaution beyond the sound of the constant chart
  I haven't had some resport to see the environment as well as the best of the importance. 
  I don't want to watch up the Godders of the Barbara, what he had anything like that, or the
  I was still resting her forehead beside the first time. 
  I don't understand, but I don't get the oldest reason when I was sure that I could see how to make
  What would he know.
  I had been deported for the blots.
  "What's my own?" "Not being a scene.
  I was all right, of course.
  I was a drift of human transportation.

Medical text (one big sentence):

  For patients with severe and family history of blood transfusion alone that can
  be classified as many identifiable and atrial fibrillation in the clinical
  significance of a still compared to pregnancy and most often become an increased
  risk of diabetes mellitus asymptomatic (AR) and IgG4 in 2008 observational
  studies of cardiovascular resistance that is not associated with an option for
  the regimen to lower that make in the number of decisions to experience the
  receptor which prevention for all doses of circumcision in the following:
  Alternative dietary control and gait and severe health care of pain associated
  with a number of infections in which there is an adverse event of the patient
  and recovery and limited decision to worrien the rate of diabetes mellitus and
  care for the arm is discussed in detail elsewhere.

More science fiction (interactive, with my own input in brackets):

  [The adarf moved with grace through the] flush of the north and all the captain
  who said, "I'll stay something functional more than [any gosh-darned Wilick
  miner! I mean, F-those people! What do you think,] I do not think why you can see
  that temporarily doesn't matter this morning?" "I know about [your affair with
  Mina, btw. Did you think I wouldn't notice? Pass the] old part of them that and
  they may see but there are the feelings of the speed of your [anger and the
  shortness of your fuse], receiver in the door.

Again, I don't see the great difference with traditional models- in terms of coherence and grammaticality, it's hard to see the benefits of more expensive techniques. Sorry guys.

jaster · on Oct 25, 2017

Ha very nice!

I made a vary similar (but much rougher around the edges) project for fun a few months ago using a smaller french corpus of press releases from La Quadrature du Net (a non-profit lobbying for net neutrality in France and Europe).

There is a live demo at http://jorquera.net/quadra_neural/. You can press tab to auto-complete word-by-word, or just start typing and the suggestions update on the fly. Instead of "Weirdness" I called my cursor "Drunkenness" :)

I used keras-js (https://github.com/transcranial/keras-js) to run my model in the browser.

If you are interested in this kind of experiments, do go for it! It's quite accessible, and projects of this scale do not require heavy hardware (I generated my models using a GTX 770).

If you need a kickstart, in addition to the repo from the OP all the code for my project is Free Software (https://github.com/tomjorquera/quadra_neural), and I tried to document the generation process in a jupyter notepad https://github.com/tomjorquera/quadra_neural/blob/master/gen.... While I did it on a french corpus, it should work with any language.

I used two main resources for inspiration :

- http://karpathy.github.io/2015/05/21/rnn-effectiveness/

- https://github.com/fchollet/keras/blob/master/examples/lstm_...

westoncb · on Oct 25, 2017

Pretty hilarious. I've found Tupac to work the best, e.g. (my starting text in quotes):

  "I was reading Hacker News when" he gladed out
  Somebody excuse I went down today
  I wonder if I give a fuck, when I creep

For some reason Donald Trump always seems to produce garbage...

sajattack · on Oct 26, 2017

"For some reason Donald Trump always seems to produce garbage..."

I think that means it's working correctly.

c3534l · on Oct 25, 2017

Eh, it's scarcely better than a Markov chain. I was hoping for something a little more useful, like the ability to scroll through many similar words or concepts.

gwern · on Oct 26, 2017

You might like this, then: a word2vec tool which suggests words of the same rough meaning starting with a particular letter or spelling, allowing you to write Genesis solely in words starting with 'a' https://llamasandmystegosaurus.blogspot.com/2017/05/alpha.ht... or generate large numbers of rhyming definitions (see comments), some examples of which:

    Cat: purr fur
    Dolphin: bark shark
    Puppy: yelp welp
    Shetland pony: shag nag
    Mosquito: mighty bitey
   Shrimp: chill krill

nsomaru · on Oct 25, 2017

Fun with Shakespeare:

It seemed, somehow mock'd; Unbashed as it was my father.

My lady shall have him, perchance and reasons will abide it. Upon his unspoken thing, the devil shall see.

MISTRESS QUICKLY: Hide amongst you the gods? Wherefore comes here, sir?

dchuk · on Oct 25, 2017

We have to be pretty close to someone having a system to generate mountains of convincing/unique "content" using Neural Nets (or something comparable) for the purposes of web spam/seo right?

Back in the day, SEOs would use spun content to generate lots of unique variants. But at this point, surely a ML algorithm can crack this nut?

PeterisP · on Oct 25, 2017

Sure, at this point building RNNs that generate unique text on a particular topic isn't a big thing to do, it's something that can be given as a home exercise for undergrads.

giosch · on Oct 25, 2017

I was expecting it to at least generate valid words, but that doesn't seem to be the case...

ballenf · on Oct 25, 2017

Safari on latest MacOS gives me unintelligible words pretty often.

My input: "I went to the grocery store to pick up some milk for"

Output on wikipedia setting:

"I went to the grocery story to pick up some milk for ... proposes of such to take a amplifier and the exporteruation of the player of former"

None of the options seem to key off of the text of my sentence except for vaguely the last word (in my case a generic preposition). Am I doing it wrong?

Here's another example, with the ellipses where I pressed tab:

> As I walked along the forest trail, I came to a most amazing [...] the history of the town himself, the complete material was made to be a traditional

yorwba · on Oct 26, 2017

You're starting the text like a story (in the first person even!). Wikipedia articles aren't usually written like that, so the network is in unfamiliar territory. Its only option is to quickly revert back to something closer to the style of an informative article.

bijection · on Oct 25, 2017

What is it generating for you? (and what OS/browser are you using?)

stmw · on Oct 25, 2017

This is actually pretty neat, although requires more digging

sterex · on Oct 27, 2017

Is there a technical reason why it generates text with lot of spelling mistakes?

Ayn Rand seems very sloppy with her spelling!

novium · on Oct 26, 2017

It's pretty neat, even got a t.co URL. Sadly, it didn't lead anywhere.

ttul · on Oct 25, 2017

Mobile Safari just sits there...

_grep_ · on Oct 25, 2017

Hitting tab crashes Chrome for me.

bijection · on Oct 25, 2017

Thanks for the tip! What OS and chrome version are you using?

snerbles · on Oct 25, 2017

Not OP, same issue here. Whole browser just locks up, pretty impressive.

Windows 7, Version 61.0.3163.100 (Official Build) (64-bit)

_grep_ · on Oct 25, 2017

Same specs here.

mipmap04 · on Oct 25, 2017

Reporting the same issue, same build.