Hacker News new | past | comments | ask | show | jobs | submit login

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html




As usual, dang is wrong and not moderating effectively. This is not a shallow comment but a legitimate concern about spaCy, and to a lesser extent other NLP tools such as NLTK. Most of the tooling around them that people end up using really is nothing more than wrappers around other tools. See the default tokenizers or models utilized by these tools.

And yes, even if spaCy is not making money itself, you can bet that the other paid for tools that they sell are.


Actually if the GP had posted this critique instead of a shallow, reductionist internet dismissal ("just want to sell the hype"), that would have been fine. Thoughtful critique is welcome—it just requires higher-quality comments than that.


spaCy sell their service its not free. from https://explosion.ai/about

"In August 2021, we sold 5% of Explosion to SignalFire for $6 million. Employees are given a stake in Explosion using a virtual share bonus program."


NLTK might not be the best example of this--it was originally written for pedagogical purposes[1]. One that was sorely needed, too, as in 2006 it was very difficult for a student in an intro to NLP class to easily track down and use existing implementations of various algorithms. NLTK continues to be useful for similar reasons today, as it provides the only relatively usable interface to some valuable but poorly engineered (by modern standards) academic resources such as FrameNet.

Anyway, original commenter seems to presuppose that there's no value in collating and polishing existing resources. Isn't that all, say, a Linux distro is? Is it true that "[Canonical] just wrap all the open source [code] into a [distro] and just want to sell the hype"? If it seems like curating and polishing has value on the market, maybe we ought to at least entertain the possibility that this value is not an illusion of marketing. That's why I agree with dang that OC is a facile dismissal.

Also worth noting that spaCy isn't just wrappers. spaCy's tokenizer, which is an original work, is used in at least two cutting-edge academic NLP libraries, AllenNLP (@AI2)[2] and Stanza (@Stanford)[3], presumably because using spaCy's tokenizer was better than the alternatives.

[1]: https://aclanthology.org/P06-4018.pdf

[2]: https://github.com/allenai/allennlp/blob/e0ee7f43d5da973e77d...

[3]: https://github.com/stanfordnlp/stanza/blob/68aa42653d656f613...


spaCy makes lots of money. from https://explosion.ai/about "In August 2021, we sold 5% of Explosion to SignalFire for $6 million. Employees are given a stake in Explosion using a virtual share bonus program."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: