Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, the function rewards staying in the middle of the track.


Staying in the middle of the track is not a necessary requirement in the reward function. The reason I include it is to speed up the learning time in the beginning. You can remove it once you learn a reasonable policy and see it the agent can find the optimal apex path. I will do a test tonight.

Just like in human world : You first learn how to drive before you learn how to drift the car.


Can you please explain you are learning a SPECIFIC track or a random one ?

How would the existing agent fare on a brand new track ?


Hi~I used Aalborg track as my training dataset and I used Alpine1 track as my validation dataset. The Alpine1 track is 3 times longer than Aalborg. As you can see on the video, the agent can drive reasonably OK on the validation dataset.


Any news on the test? I'm curious.


Please find the result below. I modified the reward function such that staying in the middle of the track is no longer required.

https://youtu.be/Tb5gASEJIRM


Hi Bluetwo, I am currently travelling to the San Francisco right now. Can you send me a e-mail yanpan@gmail.com so I can contact you and e-mail you the result directly when I back to Hong Kong?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: