No offence, but with a megabyte of text per speaker I feel I could do better wit...

YeGoblynQueenne · on Oct 26, 2017

>> I think the issue is there is something fundamental and sophisticated about human language which our current deep learning models, with all their omniscient benevolence ( or whatever ), are missing. There's something deep about the structure of language that we are not modelling yet in deep learning as far as I've seen.

I think the secret sauce that's missing from deep learning -as well as any other kind of statistical language model- is a representation of the context outside language itself.

What I mean is, when we humans [1] communicate using language, the language we generate (and recognise) does not carry all of the information that we want to convey. A lot of the meaning in our utterances ...is not in our utterances.

We haven't really found any way to represent this (dare I say) deep context yet. In genearl, in NLP, even the word "context" means the context of a token, in other words the tokens around it. Even mighty word vectors work that way.

The problem is of course that its very hard to even find data to train on, if you want to model that context with some machine learning algorithm. How do you represent everything that a person might know about the world, when they speak or write something?

But- without that deep context, any utterance is just random noise, even if it's structurally correct. So we're left with a bunch of techniques that are damn good at modelling structure, but with meaning, we fail.

___________

[1] We are all humans here, right? Just in case- I love AI! Go robots!

projectant · on Oct 27, 2017

i think this, learning of hierarchies, could be the start of something relevant to unlocking more workable language modelling

https://blog.openai.com/learning-a-hierarchy/

rljy · on Oct 27, 2017

I certainly agree. After-all, each rule for creating a pair in a syntax tree could be learned and these rules could be put together [1]

[1]https://www.google.cz/search?q=sentence+structure+chomsky&so...

YeGoblynQueenne · on Oct 26, 2017

I also agree that it's hard to see the benefit of using deep learning (which implies gigantic amounts of data and processing power, therefore costs) over traditional models like n- or skip-grams and markov chains. At best, you'll be paying a big overhead, in working hours and expertise no less, for just a modest improvement in, say, perplexity.

And since the use is in generating natural language, the really important evaluation is extrinsic (how well received your prodcut is by your user base, how well it integrates with what you have already etc). In that sense, it's even harder to see the benefit of deep learning over simpler, faster, cheaper models. Is your user base really going to notice an improvement of 10% over whatever intrinsic score you get from deep learning?

Anycase, I made a couple of short experiments with the application. Here's the results:

Science fiction (random phrases):

  I was flattered and free, my bracket series of precaution beyond the sound of the constant chart
  I haven't had some resport to see the environment as well as the best of the importance. 
  I don't want to watch up the Godders of the Barbara, what he had anything like that, or the
  I was still resting her forehead beside the first time. 
  I don't understand, but I don't get the oldest reason when I was sure that I could see how to make
  What would he know.
  I had been deported for the blots.
  "What's my own?" "Not being a scene.
  I was all right, of course.
  I was a drift of human transportation.

Medical text (one big sentence):

  For patients with severe and family history of blood transfusion alone that can
  be classified as many identifiable and atrial fibrillation in the clinical
  significance of a still compared to pregnancy and most often become an increased
  risk of diabetes mellitus asymptomatic (AR) and IgG4 in 2008 observational
  studies of cardiovascular resistance that is not associated with an option for
  the regimen to lower that make in the number of decisions to experience the
  receptor which prevention for all doses of circumcision in the following:
  Alternative dietary control and gait and severe health care of pain associated
  with a number of infections in which there is an adverse event of the patient
  and recovery and limited decision to worrien the rate of diabetes mellitus and
  care for the arm is discussed in detail elsewhere.

More science fiction (interactive, with my own input in brackets):

  [The adarf moved with grace through the] flush of the north and all the captain
  who said, "I'll stay something functional more than [any gosh-darned Wilick
  miner! I mean, F-those people! What do you think,] I do not think why you can see
  that temporarily doesn't matter this morning?" "I know about [your affair with
  Mina, btw. Did you think I wouldn't notice? Pass the] old part of them that and
  they may see but there are the feelings of the speed of your [anger and the
  shortness of your fuse], receiver in the door.

Again, I don't see the great difference with traditional models- in terms of coherence and grammaticality, it's hard to see the benefits of more expensive techniques. Sorry guys.