Very interesting. Given enough training, would the car learn to find the apexes ...

yazr · on Oct 11, 2016

IMHO yes.

It is probably an easy local maxima with relatively fast convergence

Far harder are

a. "planning" - finding the optimal path through sequential turns

b. generalizing the learned experience to a new, unseen situation

Would love the get feedback from the author on this

sp332 · on Oct 11, 2016

No, the function rewards staying in the middle of the track.

yanpanlau · on Oct 11, 2016

Staying in the middle of the track is not a necessary requirement in the reward function. The reason I include it is to speed up the learning time in the beginning. You can remove it once you learn a reasonable policy and see it the agent can find the optimal apex path. I will do a test tonight.

Just like in human world : You first learn how to drive before you learn how to drift the car.

yazr · on Oct 13, 2016

Can you please explain you are learning a SPECIFIC track or a random one ?

How would the existing agent fare on a brand new track ?

yanpanlau · on Oct 16, 2016

Hi~I used Aalborg track as my training dataset and I used Alpine1 track as my validation dataset. The Alpine1 track is 3 times longer than Aalborg. As you can see on the video, the agent can drive reasonably OK on the validation dataset.

bluetwo · on Oct 14, 2016

Any news on the test? I'm curious.

yanpanlau · on Oct 25, 2016

Please find the result below. I modified the reward function such that staying in the middle of the track is no longer required.

https://youtu.be/Tb5gASEJIRM

yanpanlau · on Oct 16, 2016

Hi Bluetwo, I am currently travelling to the San Francisco right now. Can you send me a e-mail yanpan@gmail.com so I can contact you and e-mail you the result directly when I back to Hong Kong?