Hacker News new | past | comments | ask | show | jobs | submit login

In the experiment on teaching an LSTM to count, it's useful to note that the examples it's trained on are derivations [1] from a grammar a^nb^n (with n > 0), a classic example of a Context-Freee Grammar (CFG).

It's well understood that CFGs can not be induced from examples. Which accounts for the fact that LSTMs cannot learn "counting" in this manner, nor indeed can any other learning method that learns from examples.

_______________

[1] "Strings generated from"

[2] The same goes for any formal grammars other than finite ones, as in simpler than regular.




> It's well understood that CFGs can not be induced from examples

I think you mean something more specific (e.g. polynomial in a particular sense).

+ Automatic Learning of Context-Free Grammar (Chen et al.): http://www.aclweb.org/anthology/O06-1004

+ Learning context-free grammars from structural data in polynomial time (Sakakibara, 1994): http://www.sciencedirect.com/science/article/pii/03043975909... (uses positive skeletons)

Nice overview: http://staff.icar.cnr.it/staff/ruffolo/public_html/progetti/...


Aw man! That 1994 paper by Sakakibara is a piece of history! It concludes by saying that it's "part of the work in the major R&D of the Fifth Generation Computer Project conducted under the program set up by MITI" [1]. Plus, Sakakibara is one of my grammar induction heroes :0

However- his algorithm learns CFGs from structural data, which is to say, derivation trees (think parse trees). So it's completely irrelevant to the example in the article, that attempts to learn a^nb^n from examples of its derivations -which remains impossible.

As to the other paper, by Chen, Tseng and Chen, that's about learning a CFG that reproduces the strings in a corpus- so learning a CFG of a corpus as opposed to the grammar of a context-free language (therefore, a context-free grammar) which, again, remains impossible.

_____________

[1] https://en.wikipedia.org/wiki/Fifth_generation_computer


Also http://nbviewer.jupyter.org/url/norvig.com/ipython/xkcd1313.... , considering regexes are a subset of CFGs.


From a quick glance, the example doesn't quite learn regular grammars - rather, it reduces them to finite grammars (e.g. the finite grammar of all US presidential winners and losers) and learns (or, more accurately, invents) a regular expression for them.

Finite grammar induction from positive examples only is feasible in polynomial time, so Peter Norvig's notebook will not cause the fabric of the space-time continuum to be torn asunder, I am sure.


It's not clear that the learnability results about formal grammars are useful for describing what can be learned by practical algorithms, where the grammar does not have to be identified with perfect confidence.

As an example, it is possible to induce a counting algorithm from examples with high probability: https://colala.bcs.rochester.edu/papers/piantadosi2012bootst...


Counting is one thing and I'm not sure whether this is possible to do or not (my understanding is that it's "not"), but the example in the original article is specifically trying to learn to reproduce strings of exactly n a's followed by exactly n b's, for arbitrary positive n.

That's learning the CFG a^nb^n for n > 0 in a nutshell and is therefore impossible to learn, and indeed the LSTM in the article falters after test-time n exceeds training-time n by a few units.

The paper you link to discusses subject matter I'm not familiar with (bootsrapping as in psychology, say) so I'm not really qualified to opine on it. However, from a very quick look it seems like they're learning to associate counting words with numbers- sorry if I'm wrong about that. Still, that's not "learning to count", where you have to be able to make the inductive leap that n + 1 > n, for all n in your target set of numbers in addition to learning the recursive structure of number strings.

Finally, from the little I've seen of attempts to model counting with machine learning, there hasn't yet been much success, probably because it's impossible to learn a language with both structure and meaning from examples only of its structure and no access to their meaning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: