Hacker News new | past | comments | ask | show | jobs | submit login

It parses "fruit flies like a banana" the same way as "Time flies like an arrow".

https://en.wikipedia.org/wiki/Time_flies_like_an_arrow;_frui...




Similarly, it seems to fail on "The old man the boat", marking "man" as a noun. The meaning of the sentence however, in this case, is fairly unambiguous, but parsing it can be tricky. See other: https://en.wikiped.org/wiki/Garden_path_sentence


In some sense, it's a mark of success for an AI system to fail in the same way that humans do. "The old man the boat" is a terrible sentence, essentially ungrammatical.


Can a sentence be terrible? In what way is it ungrammatical?


That's a good example as to why this tool should probably have the option to output a sample of the top-N guesses.

Some sentences are just totally ambiguous without context. "Fruit flies like a banana." isn't even good English. Is the sentence trying to say "Some particular fruit flies like a particular banana"? Or "All fruit flies like any banana"?

By the way, Spacy creator - how's the NER coming along?


Spacy's implementation, assuming it's roughly equivalent with the one syllog1sm blogged about, just does a greedy incremental parse so it only produces one candidate parse.

It is possible to do incremental dependency parsing with a beam, but all the copying of beam "states" is expensive and there are no guarantees that the n complete parses in the beam are really the n best parses w.r.t. the model.


Yes, I do greedy parsing. There are many advantages to this, which I'll write about before too long. Fundamentally, it's "the way forward". As models improve in accuracy, search matters less and less.

By the way, the beam isn't slow from the copying. That's really not so important. What matters is simply that you're evaluating, say, 8 times as many decisions. This makes the parser 6-7x slower (you can do a little bit of memoisation).


In that case, I wonder if it can output a probability score for each tag at each position, like pycrfsuite does? Then the output could be ensembled with other taggers, or otherwise pass that confidence information downstream.

Also, maybe a dumb question - is there any library or best-practice method for the ensembling of taggers / chunkers? Or must I create it myself from scratch?


How do you expect fruit to fly?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: