Hacker News new | past | comments | ask | show | jobs | submit login
Playing Atari with Six Neurons (arxiv.org)
169 points by togelius on June 6, 2018 | hide | past | favorite | 30 comments



As someone working on a reinforcement learning/neuroevolution problem right now, I find this to be extremely exciting. Fewer parameters, ceteris paribus, is always better—the fact that the experiments in this paper were run on one workstation, rather than on a massive farm of TPUs à la AlphaGo, implies quicker development iteration time and more accessibility to the average researcher.

The staging of components in this paper (compressor/controller), where neuroevolution is only applied to a low-dimensional controller, reminds me of Ha and Schmidhuber's recent paper on world models (which is briefly cited) [1]. They employ a variational autoencoder with ~4.4M parameters, an RNN with ~1.7M parameters, and a final controller with just 1,088 parameters! Though it's recently been shown that neuroevolution can scale to millions of parameters [2], the technique of applying evolution to as few parameters as possible and supplementing with either autoencoders or vector quantization seems to be gaining traction. I hope to apply some of the ideas in this paper to multiple co-evolving agents...

[1]. https://worldmodels.github.io

[2]. https://arxiv.org/abs/1712.06567


You may be interested in an even older paper: http://www.idsia.ch/~juergen/icdl2011cuccu.pdf


Thanks so much! I read this (and a few related papers) today. Besides the novel algorithm discussed in the new Atari paper, do you have a reference implementation of online vector quantization you might be able to recommend? I think I could probably figure it out from the paper alone, but sometimes it's nice to see code other people have already optimized. :)


Uhm unfortunately I do not, I could search for some on Google but I doubt I would fare better than you at it. I went to code my own version, it is quite straightforward. You can find it here: https://github.com/giuse/machine_learning_workbench/blob/mas... although polluted by research's trial and error, you can easily check the minimal code necessary to run. Here's an example of how to use it: https://github.com/giuse/machine_learning_workbench/blob/mas... Let me know if that works for you or if you have further questions!


That’s excellent! Thanks!


>I hope to apply some of the ideas in this paper to multiple co-evolving agents...

care to elaborate?


Cool article, lots to digest, one thing caught my eye:

"To the best of our knowledge, the only prior work using unsupervised learning as a pre-processor for neuroevolution is (cite)."

Just amazing how much low-hanging fruit there still is in the space.


Author here. The idea is low-hanging indeed, several friends (including @togelius!) commented "I always wanted to do that -- eventually". Realization is another matter. Have a look at the mess necessary to make it work: we had to discard UL initialization for online learning, accept that the encoding would grow in size, adapt the network sensibly to these changes, and tweak the ES to account for the extra weights.


I have been wolfing down RL articles, videos and publications after a intro to deep learning via Manning's Deep Learning for some time now and while the overall concept of RL is easy to grasp (agents, actions and state etc) some of the finer details and processes are quite confusing.

I am tempted to blame inconsistency across terminology and implementations for this lack of understanding but I suspect it has more to do with approaching this field through the lens of a developer and not a researcher or academic. Trying to understand the code without grasping the "science" of the mechanisms completely.

Either way if you feel to be in a similar spot check out this resource: https://reinforce.io and their respective Github repo: https://github.com/reinforceio/tensorforce.

Just reading through their code, and documentation has made a lot of the concepts clearer.

And a few more resources I found really helpful: http://karpathy.github.io/2016/05/31/rl/ https://www.analyticsvidhya.com/blog/2017/01/introduction-to... https://www.oreilly.com/ideas/reinforcement-learning-with-te...

Edit: My point that I forgot to mention was that I always feel like I am playing catch-up to understand what is going on half the time as the amount of new content being released exceeds what I can absorb.



Ruby should be perfectly legible with a Python background, but for any question just ping me on twitter. I would be happy to build a dialog :)


Just curious, why did you pick ruby over python? Personal familiarity?


Sure, familiarity matters. But I believe in the rational reasons that brought me back to this tool over and over, until I built familiarity indeed :) I posted an overbloated discussion on it on Reddit, feel free to read as little as you need ;) https://www.reddit.com/r/MachineLearning/comments/8p1o8d/r_p...


It's particularly interesting that they've chosen to wrap Python using Pycall. I'd love to hear about the tradeoffs of that.


Sure! It's quite simple: works like a charm. Completely transparent. You `import` with Python-like syntax, and you get a Ruby object that transparently forwards any message (i.e. method calls) to the corresponding Python object on the underlying Python interpreter.

This means that Ruby does not need to know _anything_ about the Python object: whatever you call on the Ruby object is just forwarded to the Python one, and whatever result is passed back to Ruby.

About the overhead, I sincerely do not know; I expected to have some so my code does part of the image pre-processing directly in Python (`narray`) in order to pass a smaller object to Ruby, but besides that I could perceive none -- grain of salt advised, as that was possibly hidden from me because my computation in Ruby was orders of magnitude more complex/time consuming than what was going on on the Python interpreter.

Definitely ping Murata-san either on GitHub https://github.com/mrkn/pycall.rb/ or Twitter `@mrkn`, I will send him a link to this thread so he can contribute if he feels like it. Personally, I am a fan of his work and elegant approach, I owe him for enabling me to keep working in Ruby while everybody publishes code in Python :)


Sounds great - I also prefer Ruby very strongly, and have tended to avoid Python code because I didn't expect to have an easy way of wrapping it, but will definitively have to play with Pycall.


Uhm maybe I should have pointed this out earlier but the algorithms implementation can be found (independent of deep neuroevolution) in my Ruby machine learning workbench repo (in turn imported in DNE): https://github.com/giuse/machine_learning_workbench


... and that's three more than the average Atari marketing exec had back then. No wonder they had trouble understanding the game industry :-)


Okay, a true story, my second disappointing interaction with Atari marketing.

One fine day my boss came to me and said that he had an ask from Atari Marketing (in the Home Computer arm of the company).

The marketing drone came to my office (yes, we had offices in those days). "My idea is to pre-copyright all possible 8x8 bitmaps so that people can't use them without our permission. Can you print them out for me so we can submit them to the copyright office?" He actually meant all possible 8x8 bitmaps containing five colors, with colors chosen from an 7 or 8 bit space (I forget which).

I told him the story of the guy who supposedly invented chess, and was offered a choice of reward by his king. The fellow simply asked, "Just give me one grain of rice for the first square, two grains of rice for the second, four for the third, and so on." Most of you know how this ends, it's grade school math.

I explained to the marketing guy that the printout would probably outweigh the planet, maybe the solar system, maybe the galaxy. He went away, a little disgusted with those pesky engineers. (I don't know if he was the same oxygen waster who wanted me to write a 16K cartridge in just a couple of weeks, but he certainly was in the same department).

So I'm still sticking with three brain cells, despite all the downvotes :-)


I was curious, so I threw it into Wolfram alpha:

(weight of a sheet of paper) * 5^64

.4 x Milky Way mass

Almost half the galaxy mass, that's a lot of grains of rice!


Actually it's a LOT more than that, since each of the 5 colors has 128 possible values. There's some duplication (same color slot, trivial rotations and reflections and such) but I think the order of magnitude is probably "cluster of galaxies" at a minimum :-)


Wow, that increases it by a lot; it's much, much heavier than the mass of the visible universe [0]. Universe is ~1e50 kg, the marketing exec's request was ~1e500kg. Now you can go back in time and tell them just how wrong they are ;)

[0] http://www.wolframalpha.com/input/?source=frontpage-immediat...


OK, you're so grayed out but your bio says you've been programming since '79 and you've written games for Atari. So perhaps all we need is some elaboration? They seem like a successful company, don't they?


There is not really much if anything left of the original Atari. That pretty much failed after the video game crash in 1983, and was split in two, and bounced around various owners.

The Atari that brought out the Atari ST etc. was one of those, but that pretty much failed and Tramiel merged it into JTS and later sold the remains to Hasbro which then sold it to Infogrames Entertainment. The current Atari Inc. used to be Infogrames, and just licensed the name. Infogrames Entertainment itself then renamed itself to Atari SA.

The other part of the original Atari, Atari Games Inc. failed in 2003. The intellectual property of that division is as far as I know now owned by Warner.


Story time maybe? Atari were successful for a while but they crashed pretty hard.



Ooh, thanks - I read a bit of the blog but didn't find this one!


Stories!


I can post on hacker news with only 4.


Yeah, that's pretty apparent. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: