I created an RPG game backend for a game called Path of Exile. Not an FPS but similar challenges. I don’t know how similar any of this is to other game backends, but I’ll supply a few details.
Our backend consists of a few somewhat large services that are broken up mostly around how they are sharded.
The biggest one is the account authority which contains most of the accout/character/item data and handles the vast majority of traffic.
We have 5 shards of that (sharding on account id) with 2 read only replicas of each one of those. All the read only requests go to one replica, and the other replica is for redundancy.
There are also other services like the party manager, ladder authority, instance manager, etc.
All of those shard on different things which is why they are seperate services.
The instance managers handle creating game instances which are the servers that the players actually play on.
We have a pile of servers which we call instance servers each of which runs an instance spawner. When it starts up the instance spawner adds its capacity to an instance manager and creates a process called a Prespawner. This prespawner loads the entire data set of the game and then waits for the instance spawner for orders.
When the instance spawner wants to create a new game instance, the prespawner runs fork() and then the new process generates it’s random game terrain which takes a few hundred milliseconds.
Because all the game resources are loaded before the fork they are all already available in memory that is shared between all of the instances running on the machine. Therefore each instance only takes 5-20 Mb of memory each which is mostly the generated terrain and monsters.
We typically run about 500 instances on the min-spec cheap Single processor Xeon servers we rent. This used to be around 1600 instances in the early days but the game got more and more CPU intensive over time as the game got more hectic over the years.
All the instances connect to Routers. There is one per instance machine, all of which connects to a few routers per data center, all of which connects to a set of core routers which also have all the backend services connected to them.
These routers are important because they know where everything and everyone currently is.
The routers work sort of like internet routers work, but instead of IP addresses, you address your requests to logical entities or groups which can move around and the router network is tasked with keeping track of.
So for example, when you whisper someone, you are sending a message to Account:123 and it will find it’s way to whatever server currently has Account:123 on it right now. If you send a message in global chat to GlobalChat:1 it will be multicasted through the network to all the servers which have currently registered an interest in hearing GlobalChat:1.
If you add someone to your friend list, then what that means is the server you are on will register interest in multicast group AccountSession:123 which is a group that account 123 will multicast all its status updates to like moving between zones or leveling up or whatever.
Parties, Leagues, Guilds, Etc, etc. All of these things have multicast groups associated with them.
If you have any more questions then feel free to ask.
Very interesting to see a GGG answer here! I admit to being very curious about the Path of Exile architecture, and your answer has barely whet my appetite. I have some questions that might give me better clarity over architecture if you are up for answering them:
1. How is data replicated across regions? And how is trade across regions handled? Do the instance servers hand over character data to the account authority in the new region?
2. I remember speculation about some builds that caused extreme amounts of server side compute and slowed things down, was this compute performed on the instance servers? Like poison/chain/monster damage calculations?
3. Is there any sort of automated detection of inconsistent game states done by the instance servers? Duping protections or some such?
4. What is the scaling plan like at GGG? Does the system have obvious bottlenecks that are known or is it easy to scale for the near future?
How do you structure your database schema and handle things like upgrades or versioning? Also curious how the game instances interact with the database and at what frequency and granularity.
Thanks for taking the time to write up that overview, very cool to read!
Our backend consists of a few somewhat large services that are broken up mostly around how they are sharded.
The biggest one is the account authority which contains most of the accout/character/item data and handles the vast majority of traffic.
We have 5 shards of that (sharding on account id) with 2 read only replicas of each one of those. All the read only requests go to one replica, and the other replica is for redundancy.
There are also other services like the party manager, ladder authority, instance manager, etc.
All of those shard on different things which is why they are seperate services.
The instance managers handle creating game instances which are the servers that the players actually play on.
We have a pile of servers which we call instance servers each of which runs an instance spawner. When it starts up the instance spawner adds its capacity to an instance manager and creates a process called a Prespawner. This prespawner loads the entire data set of the game and then waits for the instance spawner for orders.
When the instance spawner wants to create a new game instance, the prespawner runs fork() and then the new process generates it’s random game terrain which takes a few hundred milliseconds.
Because all the game resources are loaded before the fork they are all already available in memory that is shared between all of the instances running on the machine. Therefore each instance only takes 5-20 Mb of memory each which is mostly the generated terrain and monsters.
We typically run about 500 instances on the min-spec cheap Single processor Xeon servers we rent. This used to be around 1600 instances in the early days but the game got more and more CPU intensive over time as the game got more hectic over the years.
All the instances connect to Routers. There is one per instance machine, all of which connects to a few routers per data center, all of which connects to a set of core routers which also have all the backend services connected to them.
These routers are important because they know where everything and everyone currently is.
The routers work sort of like internet routers work, but instead of IP addresses, you address your requests to logical entities or groups which can move around and the router network is tasked with keeping track of.
So for example, when you whisper someone, you are sending a message to Account:123 and it will find it’s way to whatever server currently has Account:123 on it right now. If you send a message in global chat to GlobalChat:1 it will be multicasted through the network to all the servers which have currently registered an interest in hearing GlobalChat:1.
If you add someone to your friend list, then what that means is the server you are on will register interest in multicast group AccountSession:123 which is a group that account 123 will multicast all its status updates to like moving between zones or leveling up or whatever.
Parties, Leagues, Guilds, Etc, etc. All of these things have multicast groups associated with them.
If you have any more questions then feel free to ask.