In the experiment on teaching an LSTM to count, it's useful to note that the exa...

MrQuincle · on June 10, 2017

> It's well understood that CFGs can not be induced from examples

I think you mean something more specific (e.g. polynomial in a particular sense).

+ Automatic Learning of Context-Free Grammar (Chen et al.): http://www.aclweb.org/anthology/O06-1004

+ Learning context-free grammars from structural data in polynomial time (Sakakibara, 1994): http://www.sciencedirect.com/science/article/pii/03043975909... (uses positive skeletons)

Nice overview: http://staff.icar.cnr.it/staff/ruffolo/public_html/progetti/...

YeGoblynQueenne · on June 10, 2017

Aw man! That 1994 paper by Sakakibara is a piece of history! It concludes by saying that it's "part of the work in the major R&D of the Fifth Generation Computer Project conducted under the program set up by MITI" [1]. Plus, Sakakibara is one of my grammar induction heroes :0

However- his algorithm learns CFGs from structural data, which is to say, derivation trees (think parse trees). So it's completely irrelevant to the example in the article, that attempts to learn a^nb^n from examples of its derivations -which remains impossible.

As to the other paper, by Chen, Tseng and Chen, that's about learning a CFG that reproduces the strings in a corpus- so learning a CFG of a corpus as opposed to the grammar of a context-free language (therefore, a context-free grammar) which, again, remains impossible.

_____________

[1] https://en.wikipedia.org/wiki/Fifth_generation_computer

avmich · on June 10, 2017

Also http://nbviewer.jupyter.org/url/norvig.com/ipython/xkcd1313.... , considering regexes are a subset of CFGs.

YeGoblynQueenne · on June 10, 2017

From a quick glance, the example doesn't quite learn regular grammars - rather, it reduces them to finite grammars (e.g. the finite grammar of all US presidential winners and losers) and learns (or, more accurately, invents) a regular expression for them.

Finite grammar induction from positive examples only is feasible in polynomial time, so Peter Norvig's notebook will not cause the fabric of the space-time continuum to be torn asunder, I am sure.

canjobear · on June 10, 2017

It's not clear that the learnability results about formal grammars are useful for describing what can be learned by practical algorithms, where the grammar does not have to be identified with perfect confidence.

As an example, it is possible to induce a counting algorithm from examples with high probability: https://colala.bcs.rochester.edu/papers/piantadosi2012bootst...

YeGoblynQueenne · on June 10, 2017

Counting is one thing and I'm not sure whether this is possible to do or not (my understanding is that it's "not"), but the example in the original article is specifically trying to learn to reproduce strings of exactly n a's followed by exactly n b's, for arbitrary positive n.

That's learning the CFG a^nb^n for n > 0 in a nutshell and is therefore impossible to learn, and indeed the LSTM in the article falters after test-time n exceeds training-time n by a few units.

The paper you link to discusses subject matter I'm not familiar with (bootsrapping as in psychology, say) so I'm not really qualified to opine on it. However, from a very quick look it seems like they're learning to associate counting words with numbers- sorry if I'm wrong about that. Still, that's not "learning to count", where you have to be able to make the inductive leap that n + 1 > n, for all n in your target set of numbers in addition to learning the recursive structure of number strings.

Finally, from the little I've seen of attempts to model counting with machine learning, there hasn't yet been much success, probably because it's impossible to learn a language with both structure and meaning from examples only of its structure and no access to their meaning.