*instantly tell whether an arbitrary sentence is grammatical or not* You do real...

make3 · on June 14, 2021

"You do realize we can train a neural network to perform this task"

I'm doing a master's in deep learning for NLP and I'm not sure we can. Language modelling can't do this because grammatical yet semantically implausible combinations of words yield very low perplexity, like the classic being Noam Chomsky's "Colorless green ideas sleep furiously".

What would be a training set for this? I assume we would first try to do parsing to extract the grammatical role of each word. Then what would be the dataset? A massive attempt at generating the set of all possible trees that are grammatical?

I guess we could use massive textual datasets from reputable sources and extract their grammatical role tree, and learn from that. Generating negative examples with sufficient coverage would be very hard. Strict generative modelling without negative examples with good coverage would see the same problem as with language modelling, where acceptable but unlikely examples would have low perplexity despite being good.

It would seem to me that in order to generate negative examples with good coverage, your would need to have a man made program with a definition of what grammaticality means, which would make making a neural network useless to begin with.

Seems like the experts agree with my take: https://linguistics.stackexchange.com/a/1108

p1esk · on June 14, 2021

Constructing a training dataset is a separate problem. You could potentially crowdsource enough negative examples. Once you have the dataset, a neural network would most likely be able to learn to classify sentences with a reasonably good accuracy.

Unlike current DL models, humans have a world model (common sense) which is formed through an ability to create/update/lookup explicit rules/facts. Once we figure out how to incorporate that into a learning algorithm and/or a model architecture, AI will become a lot smarter.

alecst · on June 14, 2021

If we can train a computer to classify sentences as grammatical or not please let me know where. You’ll save the linguistics department a lot of money as they’ll no longer have to contact native speakers for this research.

xapata · on June 13, 2021

Humans require fewer examples to learn language rules. It's not clear that humans use the same learning model a "deep net."

p1esk · on June 13, 2021

Humans also require a lot of examples to learn a language - years of everyday practice for a young human. Learning algorithms are not the same, but you still need to train a large neural network - lots of neurons with lots of connections (weights) - whether it's in your head or in a datacenter.

alecst · on June 14, 2021

There’s some evidence that humans have a Universal Grammar and learn through deletion. And humans can not learn any old language — only a restricted class — meanwhile there’s no reason to think that an ML model would have that problem.

I’d encourage you to read a little more about the topic with an open mind. You might learn something.