Hacker News new | past | comments | ask | show | jobs | submit login

There are a few applications missing:

- Answering a question by returning a search result from a large body of texts. E.g. "How do I change the background color of a page in Javascript?"

- Improving the readability of a text. The article only mentions "understanding how difficult to read is a text".

- Establishing relationships between entities in a body of text. E.g. we could build a fact-graph from sentences like "Burning coal increases CO2", and "CO2 increase induces global warming". Useful also in medical literature where there are millions of pathways.

- Answering a question, using a large body of facts. Like search, but now it gives a precise answer.

- Finding and correcting spelling/grammatical errors.




> - Establishing relationships between entities in a body of text. E.g. we could build a fact-graph from sentences like "Burning coal increases CO2", and "CO2 increase induces global warming". Useful also in medical literature where there are millions of pathways.

That's a simple example because with 'CO2' you at least have the same string that can serve as a keyword connecting those two facts. Usually in natural language we make frequent use of anaphora to refer to people, objects and concepts previously mentioned in the text by name.

Anaphora resolution is one of the really hard problems not only in NLP but in linguistics in general. The most simple anaphoric device in languages like English is pronouns and even with those it can be quite difficult to determine what a 'he' or 'she' refers to in context.


>Anaphora resolution is one of the really hard problems not only in NLP but in linguistics in general.

This was one of the most frustrating parts of studying Latin rhetoric. The speakers would keep referring to "That thing I was talking about," and it's a noun from a subordinate clause 2 and a half paragraphs ago.


That’s actually very common in most languages. English is one of the few western languages that doesn’t do this, which makes it quite complicated for some people to write sentences in it, as in their native language such far backreferences, and long run-on sentences may be a lot more common.


> - Answering a question by returning a search result from a large body of texts. E.g. "How do I change the background color of a page in Javascript?" > - Answering a question, using a large body of facts. Like search, but now it gives a precise answer.

That is essentially a Natural Language Interface. There are simple ways to implement one for bots that receives simple commands[1]. The problem is that it quickly become very hard if you are trying to do something more open ended that a bot. So, there was simply no room to include it.

> - Improving the readability of a text. The article only mentions "understanding how difficult to read is a text".

The issue is that the formulas to measure the readability of a text cannot really be used to suggest improvements. That's because the user ends up focusing on improving the score instead of improving the text. To suggest improvements you need a much more sophisticate system.

> - Establishing relationships between entities in a body of text. E.g. we could build a fact-graph from sentences like "Burning coal increases CO2", and "CO2 increase induces global warming". Useful also in medical literature where there are millions of pathways.

This is one of the things that were axed, because in some sense it is simple if you just want to link together concepts without any causality, i.e. stuff that happens together. To do that you could link named entity recogniton (to find entities) and a simple way to find a relationship between words (i.e., they happen in the same phrase therefore they have related). However a more sophisticated form of the process, like the one that results in the Knowledge Graph[2] would be quite hard to do.

> - Finding and correcting spelling/grammatical errors.

That's a great idea, we will add how to detect spelling errors.

[1] https://medium.com/swlh/a-natural-language-user-interface-is...

[2] https://en.wikipedia.org/wiki/Knowledge_Graph


The fact that those things are hard is exactly why a guide on them would be valuable.


That's true up to a point. We wrote the article for programmers that had no previous knowledge, so we avoided stuff that is too hard. To such people stuff that is too advanced would look cool, but it would also be impractical to use.

However, we are thinking about creating a more advanced article on a later date.


Author profiling comes to mind as well


- Text generation and dialogue systems




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: