Staying in the middle of the track is not a necessary requirement in the reward function. The reason I include it is to speed up the learning time in the beginning. You can remove it once you learn a reasonable policy and see it the agent can find the optimal apex path. I will do a test tonight.
Just like in human world : You first learn how to drive before you learn how to drift the car.
Hi~I used Aalborg track as my training dataset and I used Alpine1 track as my validation dataset. The Alpine1 track is 3 times longer than Aalborg. As you can see on the video, the agent can drive reasonably OK on the validation dataset.
Hi Bluetwo, I am currently travelling to the San Francisco right now. Can you send me a e-mail yanpan@gmail.com so I can contact you and e-mail you the result directly when I back to Hong Kong?
Given enough training, would the car learn to find the apexes of turns?