Using Keras and Deep Deterministic Policy Gradient to play TORCS

modeless · on Oct 11, 2016

This is cool, but unlike some other recent neural game players [1] [2], it doesn't look at the screen at all. It also has an objective function which is tightly coupled to this game. Therefore this work is not easily generalizable to other video games, nor would it be useful for a real-world self-driving car.

[1] http://vizdoom.cs.put.edu.pl/ [2] https://deepmind.com/research/dqn/

_ntka · on Oct 11, 2016

It would be easy to replace the input features with representations from a pixel-level convnet, to make this is a "real" self-driving car, going from pixels to commands.

Anyone interested in this type of research: consider cloning the repo and implementing this modification, it would make a great starter project.

yanpanlau · on Oct 11, 2016

It is quite easy to change the input features as pixels and fit into convnet under Keras (That's why I love Keras so much). However, gym_torcs only support 64x64 pixels and it is hard to see by human eyes, IMHO.

https://github.com/ugo-nama-kun/gym_torcs/issues/4

modeless · on Oct 11, 2016

Interesting, yeah that seems a bit small. Would be a fun project to fix!

hardmaru · on Oct 11, 2016

Projects like these reminds me of why I got into AI research in the first place.

bluetwo · on Oct 11, 2016

Very interesting.

Given enough training, would the car learn to find the apexes of turns?

yazr · on Oct 11, 2016

IMHO yes.

It is probably an easy local maxima with relatively fast convergence

Far harder are

a. "planning" - finding the optimal path through sequential turns

b. generalizing the learned experience to a new, unseen situation

Would love the get feedback from the author on this

sp332 · on Oct 11, 2016

No, the function rewards staying in the middle of the track.

yanpanlau · on Oct 11, 2016

Staying in the middle of the track is not a necessary requirement in the reward function. The reason I include it is to speed up the learning time in the beginning. You can remove it once you learn a reasonable policy and see it the agent can find the optimal apex path. I will do a test tonight.

Just like in human world : You first learn how to drive before you learn how to drift the car.

yazr · on Oct 13, 2016

Can you please explain you are learning a SPECIFIC track or a random one ?

How would the existing agent fare on a brand new track ?

yanpanlau · on Oct 16, 2016

Hi~I used Aalborg track as my training dataset and I used Alpine1 track as my validation dataset. The Alpine1 track is 3 times longer than Aalborg. As you can see on the video, the agent can drive reasonably OK on the validation dataset.

bluetwo · on Oct 14, 2016

Any news on the test? I'm curious.

yanpanlau · on Oct 25, 2016

Please find the result below. I modified the reward function such that staying in the middle of the track is no longer required.

https://youtu.be/Tb5gASEJIRM

yanpanlau · on Oct 16, 2016

Hi Bluetwo, I am currently travelling to the San Francisco right now. Can you send me a e-mail yanpan@gmail.com so I can contact you and e-mail you the result directly when I back to Hong Kong?