Hacker News new | past | comments | ask | show | jobs | submit login
Training agents to invent a language (openai.com)
163 points by runesoerensen on March 18, 2017 | hide | past | favorite | 21 comments



Luc Steels was doing this years ago (since mid-1990s IIRC):

https://www.google.com/search?client=opera&q=luc+steels+inve...

Ah, yes, I see that a paper referenced on the website:

"A Paradigm for Situated and Goal-Driven Language Learning Jon Gauthier, Igor Mordatch"

grants a footnote to one of Steels' 2012 publications, stating:

"A related line of work in evolutionary linguistics constructs a similar language learning scenario entirely without fixed-language agents (Smith et al., 2003; Steels, 2012; Kirby et al., 2014). All of the agents in these environments construct a novel language simultaneously to accomplish some shared task. This is an interesting separate line of research, but ultimately a separate task from the understanding and acquisition problems discussed in this abstract."


There is no question people were trying related approaches in the past (as the paper cites!). Indeed, it's exceedingly rare to find a totally new idea; we always stand on the shoulders of giants, which include but is not limited to Steels's work. The goal, both here and in general, is to go beyond previous work — in the case of this work, to use modern DL tools to learn a more sophisticated language that has a real degree of compositionality and grounding. The hope is that by pushing this approach to its logical limit, we'll get agents that can really understand language, both artificial and natural.


There was one study together with Paul Vogt which also allows agents to establish which sensory channels to use for communication as well.

This is language grounding in a very pure way.

I think it's important not to restrict those channels. We communicate nonverbally, by touch, by smiling, by whistling, by scratching, by walking funny, by writing, by painting, by love.


It sounds like you're frustrated that deep learning is rediscovering stuff that your field discovered years ago. I've come to the conclusion that the only solution to that is to try to communicate more effectively; the reason they are not talking to us is that they can't understand what we're saying.


It's a little premature to brag about the success of this or that (still poorly understood) technique. The agents in the article have created very simple languages that are often not necessary for communication- for instance, when they're not allowed to use words they still achieve their goals by guiding other agents or pushing them around. Also, researchers had to impose very rigid constraints on the type of language the agents were learning, because of course, those agents "live" in very limited environments, conducive to very "unnatural" languages (e.g. ones composed of single words for complex combinations of actions, etc).

The main intuition behind this technique is that we can make it much easier for an agent to learn the kind of language we want it to learn, by associating a vocabulary with a given environment, rather than giving it a vast corpus of text and hoping beyond hope that it will learn exactly the kind of language we want it to from a potentially inifinite number of similar languages that could be learned from that corpus. This is an extremely cool idea, but if it turns out that we've just replaced the work of collating a huge corpus with the work of imposing limitations on the types of language that can be learned, it's still possible that it won't scale up much better than supervised learning form huge annotated corpora.

Like I say, this is an extremely cool idea that can lead to gret advances, but getting agents to create languages with the full expressive power of human language is still worlds away.

In any case, when is it a good idea to insult the intelligence of an entire (unnamed and only assumed) field?


Why the frustration? We are progressing, are we not? A sign of us making no progress would be when the new guys keep disproving the old theories.


> when the new guys keep disproving the old theories

Most old theory seems to be incommensurable with deep learnings results, so it's hard to evaluate whether this might be true.


>> the reason they are not talking to us is that they can't understand what we're saying.

Also, who is "they" and who are "us"? This is a very unreasonable and very inappropriate comment.


Hot damn that's meta.


Implications:

- Every Agent can have it's own language best suited for work in its domain. It can then auto-translate that to English at the end. This adds ambiguity once instead of at every step of the work if all of it was done in English. Like doing math with floating point then converting to int vs doing math in int the whole time. Concrete example: Law.

- Agents could evolve a language that averages the Sapir-Whorf biases of known human languages. This new language would help humans understand each other better. Concretely, this language could be used as an intermediary in UN Meetings from which known languages could be auto-translated to while mitigating learned Sapir-Whorf biases.

- As @amelius alluded to out, Agents could automate the process of learning alien languages: what we saw in the film Arrival (Dan's short story is so much better).


I certainly would love to read a list of Sapir-Whorf biases included in English!


Biases against what baseline? A language optimized to express a finite set of things?


Yeah there is no objective baseline. However, if the Agent could evolve a language that averages the biases of all human, that could be used as the baseline couldn't it?


Some humans can not make mouth sounds; some humans don't have arms. You already can't hit that baseline with one language.

As somebody who has done a great deal of idle bloviation about ideal languages, I can say that there are many obstacles. For example, you need to add a true noun because a new material is discovered and it's getting very popular. You want a short and efficient name for it, but all the space is allocated. So you add it to your table of weights, and suddenly every prefix-coded word is pronounced and written completely differently.

If you can't use prefix trees, how will you allocate the words efficiently?


I really like the introduction of "cost". Have you considered having a multi-variable time decay distribution function for all learning so that the agents need to reinforce every lesson?

Our brains forget over time because energy is limited and new connections take energy from older memories. So you could introduce a global set energy variable that is the ultimate denominator.

Lastly, have you considered connecting your agents in a one to one public channel where a person interacts with an agent that asks the person about "meaning of objects" and collects opinions from people to create it's own opinion?


Regarding memory/forgetting, this was published yesterday:

https://deepmind.com/blog/enabling-continual-learning-in-neu...


Also interesting:

https://github.com/iassael/learning-to-communicate

and the paper: https://arxiv.org/abs/1605.06676 "Learning to Communicate with Deep Multi-Agent Reinforcement Learning"


Yes, they both seem to be focused on the differentiable inter-agent communication aspect. I wonder if this is related to the recent articles on how honey bees communicate to each other the learning required for pulling strings, rolling balls, etc. (e.g. on HN https://news.ycombinator.com/item?id=13723645)


So many opportunities here! What if they let the agents just spout out random characters and let the other agents listen?

They also talk about a lot of ways to get the agents to stop "cheating," but I think a better method would just be better goals.


Yes that's mostly what they did. E.g. they adjusted the goal to favor fast communication, and then to impose a cost on exponential vocabularies. Both of these sound very reasonable.


It's interesting that a lot of added constraints mirror "real-world" scenarios. We don't have an intrinsic global co-ordinate system, and brevity/laziness is generally rewarded (which results in Zipf's Law in natural language).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: