Overview of Running an Online Game for 3 Years

themartorana · on June 12, 2016

€10/m/location is damn impressive. I've run servers for a turn-based asynchronous and real-time casual game for the past three years. With ~700k unique monthly players (about 200k/day) we do about 1500 request/s at peak and pay thousands a month for our AWS stack. I'm not mad at it, I think we get great utility for what we pay, but this is lean and mean for realz.

Kudos!

hueving · on June 12, 2016

AWS is not anywhere near cheap when it comes to raw performance per dollar. You are paying a very large premium for that flexibility of an arbitrary number of instances at any time.

def- · on June 12, 2016

Thanks! Interesting to hear about your experience as well. By requests per seconds you mean packets? I was curious and with 330 players playing on DDNet right now[1] we get ~7400 packets/s incoming, ~8300 packets/s outgoing.

[1] https://ddnet.tw/status/

Cyph0n · on June 12, 2016

My guess is that he's talking about HTTP requests.

themartorana · on June 12, 2016

Yes, I am talking about HTTP requests. We're just exploring moving over to web sockets, the original implementations were done before that was practical on mobile networks.

ryanlol · on June 12, 2016

Unless you need AWS for the scaling it offers, it tends to be the worst possible host you could choose.

And even if you need the scaling, a mixed stack with dedis and EC2 on demand shouldn't be hard to run.

eropple · on June 12, 2016

> Unless you need AWS for the scaling it offers, it tends to be the worst possible host you could choose.

Unless, in turn, you use reserved instances, which bring the price down to equal or cheaper than most other options. "dedis and EC2 on demand" is an order of magnitude more complex, probably actually more expensive, and increases your failure cases significantly.

ryanlol · on June 12, 2016

I think you're a little out of touch with hosting prices these days.

https://www.ovh.com/us/dedicated-servers/details-servers.xml... For 12 months will cost you about $800

A somewhat comparable EC2 instance (that still has way lower specs) m3.2xlarge will have an up-front cost of $1772, and an hourly cost of $0.146. So you'd be looking at a total yearly bill of $3051, which would pay for 3 of those dedis.

I honestly don't know how anyone could describe that as competitive.

> "dedis and EC2 on demand" is an order of magnitude more complex, probably actually more expensive.

If you're one of the few people that actually need the rapid scaling that EC2 can provide, sure. But that's an extremely specific and rare use case.

eropple · on June 12, 2016

I'm alarmingly familiar with OVH. And their hardware failures. And their slow support exactly when you need them most. And their poor inventory control, leaving popular system configurations unavailable when I (or, rather, my clients) need them.

I'm also familiar with exactly how misleading a specs comparison ends up when you can architect an application to leverage burstable instances and rapid auto-scaling, letting you require little expenditure during slow periods while still responding gracefully to load. This isn't some deep-magic thing for sites with tens of thousands of instances, either--I've had clients with ten total instances benefit both financially and operationally from building out fairly straightforward elastic capacity (not least because the same code paths necessary for elasticity also makes things like setting up development environments single-click affairs, which enables deeper and more complete testing of the application to ensure correctness).

Even were a $2200 premium a credible number, that's eleven hours of my billing rate. Reimplementing what AWS provides even a fairly straightforward system--easy, API-driven system deployment and configuration (and while OVH's public cloud APIs can enable some of this, your "dedis" cannot), straightforward and API-controlled support services of more or less any stripe, integrated system monitoring and alerting--would be hundreds of hours of work, be a worse solution, and require ongoing maintenance. (A decent hosting provider will offer some of that, of course. But not all. And you'll pay for it.)

Cyph0n · on June 12, 2016

> I'm alarmingly familiar with OVH. And their hardware failures. And their slow support exactly when you need them most. And their poor inventory control, leaving popular system configurations unavailable when I (or, rather, my clients) need them.

That's good to know. I was contemplating going with them for my next project. AWS it is then!

The only problem I see with AWS is the configuration overhead. Want storage? Well, we have a separate service for that! Now you have to understand the pricing, the API, etc. just to integrate it into your stack.

I completely understand how this modular approach helps with scaling, but when starting out it just feels like way too much overhead, especially for a single dev like me.

eropple · on June 13, 2016

It can certainly feel like a lot of overhead, but in many ways I think it's the "eat your vegetables" part of really understanding what you're building and, through the development of your infrastructure, figuring out where your pain points lie.

I do some advisory stuff for students and bootstrapping startups that are looking to work in cloud environments. Feel free to drop me an email (in my profile) if you'd like to chat.

ryanlol · on June 12, 2016

>I'm alarmingly familiar with OVH. And their hardware failures. And their slow support exactly when you need them most. And their poor inventory control, leaving popular system configurations unavailable when I (or, rather, my clients) need them.

There's a plenty of places that aren't OVH, however I figured they'd be the most relevant choice here as game servers were being discussed earlier. DDoS protection being cool and all, something you really don't get on EC2.

>This isn't some deep-magic thing for sites with tens of thousands of instances, either--I've had clients with ten total instances benefit both financially and operationally from building out fairly straightforward elastic capacity (not least because the same code paths necessary for elasticity also makes things like setting up development environments single-click affairs, which enables deeper and more complete testing of the application to ensure correctness).

If for the cost of 10 instances they could've had 30 dedicated servers with better specs, did they really benefit very much? While I can certainly appreciate the part about development environments, applications that would actually benefit from such scaling are rather rare. Although, admittedly, if they hired you they probably did need it.

>Even were a $2200 premium a credible number, that's eleven hours of my billing rate.

$2200 per instance, does someone only running a couple of boxes even need your services?

>Reimplementing what AWS provides even a fairly straightforward system--easy, API-driven system deployment and configuration (and while OVH's public cloud APIs can enable some of this, your "dedis" cannot), straightforward and API-controlled support services of more or less any stripe, integrated system monitoring and alerting--would be hundreds of hours of work, be a worse solution, and require ongoing maintenance.

And again, I believe most businesses simply don't need what AWS providers. AWS is certainly a good choice if you actually need what they offer, however very few do.

grogenaut · on June 12, 2016

> $2200 per instance, does someone only running a couple of boxes even need your services?

The reinsurance company I used to work for could be run on about 5 aws servers compute wise... they do about $3 trillion in in force life insurance policies a year.

grogenaut · on June 13, 2016

Hit enter early. My point was that there are plenty of huge businesses from a money perspective that don't require too much compute.

themartorana · on June 12, 2016

I'd like to chime in re:operationally. It's the single largest reason we use AWS. I'd have to add at least one head to manage infrastructure I couldn't define in JSON, and that's a lot more expensive than what we pay in AWS premium.

byefruit · on June 12, 2016

What makes this comparison even more crazy is that those AWS prices don't include any (very expensive) bandwidth or block devices.

Also, once you're paying upfront for reserved instances you've lost one of the major advantages in your cashflow scaling with your usage.

eropple · on June 13, 2016

Reserved instances can be reserved for only a percentage of available hours.

blahi · on June 12, 2016

Cloud platforms like AWS offer other things besides just scaling. Like Hadoop processing, hosted databases, email infrastructure, DNS services and the list goes on and on.

meritt · on June 12, 2016

Care to share some details about your setup? 1500 reqs/sec should not cost very much at all, definitely not "thousands a month".

themartorana · on June 12, 2016

Well, a larger chunk of that is the database. A db.r3.4xlarge runs about $1600 by itself. We have several hundred gig of data, and indexes alone keep us in that instance size for RAM allotment.

Aside from that, we run 2-3 c3.xlarge API servers, two c3.medium webserver (hosts Facebook versions of our games) and two more c3.medium servers are joining in for a new launch next month.

We have one large (size escapes me) ElastiCache instance.

3 ELBs sit in front of those servers, all in a VPC.

Finally, we have 11 or so S3 buckets that back Cloudfront distributions for static content for websites and whatnot.

We also pay for several TB of traffic per month, and enterprise support.

Finally, we're paying for not having to fix any hardware ever, 5 minute incremental DB backups, a VPC I can define in a JSON doc, automatic failovers, and so much more, managed by yours truly, a software engineer, not a network/server/devops/etc. engineer. Yeah, there's a premium, but so far it's paid off.

grogenaut · on June 12, 2016

What drove the choice to the larger instance types instead of more of the small ones? Not questioning your judgment I'm just curious.

themartorana · on June 13, 2016

We moved from Python to Go. With Python behind uwsgi->nginx, we'd try to tune the number of uwsgi processes to cores, and it was easy to scale up based on process=number of concurrent requests handled. With Go, every request is handled concurrently, being much more efficient with cores and RAM. I found that my requests/core were being artificially limited by RAM in an autoscaling group (we had scaling triggered at about avg 80% CPU) so I let some more RAM play.

Now, instead of 15-20 c3.medium servers at peak, we run three c3.xlarge instances at peak. I can be sure that the larger server will handle more concurrent requests than process-restricted uwsgi, and thus be able to better allocate the resources given to us by larger instances.

grogenaut · on June 13, 2016

Thanks for the reply. Makes sense. Lots of people don't bother to look at their instance types per application or they run so much in a single instance type that it's both io and memory and network bound in a single instance depending on what the users are doing.

teknologist · on June 13, 2016

Would you get anywhere with using those Scaleway ARM BareMetal servers? They're insanely cheap if you can run on ARM.

lccarrasco · on June 12, 2016

This was really great to read, in-depth and interesting, thanks a lot for taking the time to write it. :)

adynatos · on June 12, 2016

The author writes: "Reduce the number of syscalls by caching the value of gettimeofday() until a new tick happens or network packet comes in" But I'm pretty sure glibc on recent Linux handles gettimeofday in user-space, without context switch (kernel maps the data to userspace). I guess caching the value locally and updating it 1/sec or something would still help if there are thousands of calls/sec, but not as much as if it was really a syscall.

def- · on June 12, 2016

You're right. Unfortunately with virtual machines that doesn't always work. You can enable it manually but that's more inefficient because the Kernel can't use paravirtualization for clock then. I stumbled upon this a month ago: https://ddnet.tw/irclogs/2016-05-17.log

    21:37 <@deen> A third of the sys calls of a DDNet server are recfrom and
                  sendto each
    21:37 <@deen> they always occur in large chunks, so ideal for
                  sendmmsg/recvmmsg
    21:38 <@deen> the last third are mostly strange time and gettimeofday
                  calls, I thought I got rid of most of them
    21:40 <@deen> server with 30 players causes 3000 syscalls a second
    22:00 <@deen> PACKET_MMAP is very cool, reading packets with 0 syscalls,
                  too bad it's not for regular applications (requires root,
                  doesn't work with normal udp socket):
                  https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt
    22:04 <@deen> and then the glibc version matters a lot for syscalls. new
                  versions of glibc don't syscall at all for gettimeofday
    22:05 <@deen> (or something else is causing that, not sure yet)
    22:13 <@deen> Reading the glibc implementation, that's done by vDSO,
                  interesting: http://man7.org/linux/man-pages/man7/vdso.7.html
    22:32 <@deen> totally confused why some of our servers use vdso
                  gettimeofday, others not even though they have more recent
                  kernel and glibc
    22:41 <@deen> ok, probably depends on the underlying clock that the vps
                  uses. pvclock is used with kvm and doesn't support vdso. but
                  looks like there's some progress being made:
                  https://lkml.org/lkml/2015/12/9/914

eggy · on June 12, 2016

Awesome and inspiring to me, great work!

I am now looking at LFE (Lisp Flavored Erlang) and ELM to create a very small online game. It makes me want to maintain C/C++ chops.

It's sad Apple is so walled in that you need a VM to build for OS X, and iOS doesn't even make the list. I have an iPad, but I use an Android phone for that reason, and I only program mobile for Android. Apple is getting better at supporting iOS devs of late though...

mentos · on June 12, 2016

Hey great work!

Curious to hear what the client stack was? Did you use LibGDX by chance?

def- · on June 12, 2016

Thanks! You can see the client here: https://github.com/ddnet/ddnet

SDL2, OpenGL, FreeType, pnglite, zlib, curl, md5, wavpack, opus, json-parser

So pretty much low level, keeps the performance and flexibility high.

qwertyuiop924 · on June 12, 2016

I just re-downloaded (vanilla) teeworlds the other day. After playing a lot of QW and Xonotic, it's nice to play something like HLDM or TeeWorlds that's a bit more wacky and less competitive. There is no bunnyhopping in teeworlds. Just cute fluffballs, hookshots, and heavy weaponry. Although, for whatever reason, all of the players are in the EU, or SA, and I'm USEast, so the ping's really high. One of many reasons I want to get a GPU that can handle UT4. Yes, mine is really that bad.

ashitlerferad · on June 12, 2016

What was the reason for the fork with Teeworlds?

def- · on June 12, 2016

Teeworlds is a shooter. People wanted to race in Teeworlds, so a modification was required. From there it developed into multiplayer racing and the last few years of DDNet adding features.

urza · on June 12, 2016

I would like to see this kind of post for slither.io

ObeyTheGuts · on June 12, 2016

Maidsafe will eliminate all this server problems!