The History of Machine Learning in Trackmania

donadigo · 2024-07-04T09:44:13

This is an awesome overview and if you want more, most of those are documented in an approachable way on YouTube.

Just wanted to provide some perspective here on how many things those projects need to take care of in order to get some training setup going.

I'm the developer behind TMInterface [1] mentioned in this post, which is a TAS tool for the older TrackMania game (Nations Forever). For Linesight (last project in this post), I recently ended up working with its developers to provide them the APIs they need to access from the game. There's a lot of things RL projects usually want to do: speed up the game (one of the most important), deterministically control the vehicle, get simulation information, navigate menus, skip cut scenes, make save states, capture screenshots etc. Having each of those things implemented natively greatly impacts the stability and performance of training/inference in a RL agent, e.g. for the latest version the project uses a direct capture of the surface that's rendered to the game window, instead of using an external Python library (DxCam). This is faster, doesn't require any additional setup and also allows for training even if the game window is completely occluded by other windows.

There are also many other smaller annoying things: many games throttle FPS if the window is unfocused which is also the case here, and the tool patches out this behaviour for the project, and there's a lot more things like this. The newest release of Linesight V3 [2] can reliably approach world records and it's being trained & experimented with by quite a few people. The developers made it easy to setup and documented a lot of the process [3].

[1] https://donadigo.com/tminterface/

[2] https://youtu.be/cUojVsCJ51I

[3] https://linesight-rl.github.io/linesight/build/html/

Daneel_ · 2024-07-04T12:36:17

I know your name from falling asleep to Wirtual videos. I think I actually found his content thanks to your collaboration on the cheating scandal. Thanks for all your hard work - it's obvious how significant and beneficial it is within the TM community.

brutus1213 · 2024-07-04T12:03:56

Scientist here and have book marked this article for close reading (so apologies for this question if it is discussed in the article).

I had a few brushes in RL (with collaborators who knew more RL than I did). A key issue we encountered in different problem settings was the number of samples required to train. We created a headless version of the underlying environment but could not make it go a lot faster than real-time. We also did some work to parallelize but it wasn't enough (and it was expensive). Is the TM related RL training happening in real-time or is it possible to speed it up? That seemed like the key problem to make RL widely used, but curious about your thoughts.

donadigo · 2024-07-04T13:08:43

I'm not sure about your particular case, but if your environment really is headless, then it should absolutely be possible to run it a lot faster than realtime. It depends on what the environment is and if you have access to its source code (we do not have that in TrackMania so it's a lot harder). Either the environment is purposely throttling the amount of time it simulates, or it just takes so much time to simulate the environment that it's not possible to speed it up anymore.

We're lucky in case of TrackMania because it internally has systems to both set the relative game speed and also completely disable all rendering and just run physics. Linesight achieves about ~10x speedup where the most time spent now is in rendering game frames and running the inference on the network. They also parallelize training by running more game instances and implementing a training queue. For the "raw" speedup ratios, TM usually achieves about ~60x (one minute is simulated in one second) and I use this speedup to implement bruteforce functionality in the tool (coupled with a custom save states implementation).

msephton · 2024-07-04T12:17:07

It's possible to speed it up by running the game as fast as it can go (so, not limited as it normally is for human consumption). They talk about running it at 9x speed, so months of training could be done in 80 hours.

squigz · 2024-07-04T01:10:39

I just got into Trackmania recently. Very difficult game, especially on a keyboard, but fun! It's crazy to see how dedicated the pros are. I got into it after watching the streamer Wirtual try and beat the hardest map the game has seen (Deep Dip 2), for a prizepool of something like $30,000. It's an insanely hard tower climb map, where if you fall, you have to start completely over. A 1-2 hour run could just disappear. Anyway, over just a few weeks, Wirtual put several hundred hours into the map, with over 1,500 falls... and then gave up, understandably :P

Macuyiko · 2024-07-04T07:48:18

I follow RL from the sides (I have dabbled with it myself), and have seen some of the cool videos the article also lists. I think one of the key points (and a bit of a personal nitpick) the article makes is this:

> Thus far, every attempt at training a Trackmania-playing program has trained the program on one map at a time. As a result, no matter how well the network did on one track, it would have to be retrained - probably significantly retrained

This is a crucial aspect when talking about RL. Most of the Trackmania AI attempts focuses on a track at a time, which is not really a problem since they want to, given an individual track, outperform the best human racers.

However, it is this nuance that a lot of more business oriented users don't get when being sold on some fancy new RL project. In the real world (think self-driving cars), we typically want agents to be way more able to generalize.

Most of the RL techniques we have do rather well in these kinds of constrained environments (in a sense they eventually start overfitting on the given environment), but making them behave well in more varied environments is way harder. A lot of beginner RL tutorials also fail to make this very explicit, and will e.g. show how to train an agent to find the exit in a maze without ever trying it on a newly generated maze :).

msephton · 2024-07-04T12:19:12

By the end of the article, and in the subsequent article, they're no longer doing it one track at a time.

zamadatix · 2024-07-04T12:10:37

At first I thought you were talking about some Rocket League AI stuff haha

yuriks · 2024-07-04T07:09:19

Wanted to point out that Linesight, the final project described in the article, has since released a new update last month, and it now beats world records in about a dozen maps between official and user made ones: https://www.youtube.com/watch?v=cUojVsCJ51I It's some really impressive stuff.

msephton · 2024-07-04T12:35:22

Brilliant, thanks for the update!

programd · 2024-07-04T01:12:03

Make sure to read the followup post linked at the bottom of this one. It's vastly entertaining in watching an open source train wreck kind of way. You have to admire the persistance.

Tangentially related, is anybody besides the autonomous car folks developing games or virtual environments designed from the ground up for exporting machine learning APIs? By this I mean exporting game state and accepting game controls through the network without going through adapter contortions.

squigz · 2024-07-04T01:21:07

I think BeamNG Drive would fall under that category

emporas · 2024-07-04T03:10:54

> Nienders concluded that this was due to the difference in the information available. Sophy had information about the track curvature of the upcoming 6 seconds of track, based on the current speed. TMRL, however, only had distance measurements from the LIDAR. While the TMRL program could plan for the next turn, it could not plan two turns ahead, and this fundamentally limited the program to mere safe driving, avoiding walls and crashes, but never optimizing.

I think that point is an important one. ML algorithms work better when they are given better context. Especially in programming, it is clear the models are trained on code, rather than repositories. They know about files and repositories, but i always get the impression that they are totally clueless about whole programs.

What could be done better in code, is provide in training more data about where each function is located in the project, some other files where similar functions are defined or called and so on. In general before each code is fed into the training, to do a little bit of data mining in the project like the tree-hugger project [1] enables. Tree-hugger however is a little bit older code, and tree-sitter has advanced a lot the last 4 years.

In my opinion 5x to 10x in code, is within reach, with no need to increase GPU compute or electricity.

[1] https://github.com/autosoft-dev/tree-hugger

msephton · 2024-07-04T00:21:34

Nice work! An enjoyable read. Edit: And the newer post!

I do love Trackmania. I'd like to play 2020 but alas I do not have a compatible computer. I mostly play the Wii version.

dr_kiszonka · 2024-07-04T04:38:12

That part about motivation was outstanding. I am pasting it below in case folks missed it.

"Of course, we come into this with modest ambitions: make humans obsolete in the process of racing Trackmania. Perhaps we’ll be featured in the cover of a scientific journal to show how AI now dominates all racing sports. The kind of thing we can reasonably expect when entering a project with no directly relevant experience.

Obviously, logically, we know that this kind of oversized ambition isn’t realistic. No one at Google is going to see this and say “They really do need eight billion dollars in cloud credits for the noble task of racing emulated cars.” But having ludicrous ambitions is a key aspect for avoiding the thing that kills most projects: ennui. For us, if a project is merely aiming to be “kind of good” then it’s not going to go anywhere, because being “kind of good” is not exciting. So we trick our brains: we envision the realistic best case scenario (a pretty good racing program) and multiply it by a thousand. Is it grounded? No. But when you’re spending eight hours trying to figure out how to configure virtual desktops over ssh, you need something to keep you going. When every block you put down is a step towards a towering edifice, setbacks feel smaller, and wins feel like they’re worth something more.

So, that’s always been how we approach the early days of these projects, with unrealistic goals that we “know” are impossible. Once its potential starts to materialize, once you have the shape of the work laid down, then you can start to discard the grandiose for the grounded, and then you can start to take those castles in the clouds and boil them down into the bricks of the house. Projects like this are a long haul. We’ve already spent four full months trying to get TM2020 and OpenPlanet both installed. If you don’t have a good goal, you won’t have a reason to make it off the starting line."

smokel · 2024-07-04T06:43:41

For those who only read the comments, be sure to check out the videos by Yosh. They're amazing, and do a great job explaining how reinforcement learning works in practice:

https://youtube.com/watch?v=Dw3BZ6O_8LY

https://youtube.com/watch?v=kojH8a7BW04

jamesrom · 2024-07-04T10:48:53

It never made sense to me why they raycast from the car. Humans don't play this way. The car is an abstraction the model doesn't need to care about it.

It literally doesn't matter if it's 1px or 100 meters to the wall, just learn to not hit it.

Instead, measure from the _camera_. That's all that matters. That's what humans do when we play.

Bonus: with this added perspective you'll be able to drive maps with hills and jumps. Not just flat maps.

317070 · 2024-07-04T16:29:00

RL expert here, the problem with vision is that most likely, it takes too long to render and to process the render in your NN.

At that point you need to run the game real-time instead of faster, so you need a lot of compute to generate your data. You will also need the bandwidth to throw the data around, and the GPUs to take all that input size.

It's definitely possible, but not the place to start, and will require a lot more compute infrastructure.

jamesrom · 2024-07-05T14:25:16

I’m not saying render the scene. I’m saying take your measurements from where the camera is (or would be).

It would require no extra infra or compute.

msephton · 2024-07-04T12:21:12

They're driving non-flat maps by the end of the article and in the subsequent article. AFAIK they can only go with what they see on screen, there don't have access to camera rendering data.

jamesrom · 2024-07-05T14:25:34

I’m not saying render the scene. I’m saying take your measurements from where the camera is (or would be).

budududuroiu · 2024-07-04T03:02:50

Always wondered if something similar would be possible with milsim games like DCS Worlds.

E.g. can you improve or replicate missile intercept algorithms