Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GameNetworkingSockets – Reliable and unreliable messages over UDP (github.com/valvesoftware)
157 points by ivanfon on March 31, 2018 | hide | past | favorite | 44 comments


Second Life does something very similar. Reliable and unreliable messages, binary format, multiple messages in a single datagram.

Unreliable messsages are for ones that are superseded by later messages. There's no point in retransmitting; you always want the latest object positions.

The big problem is message priority. Games need a low-latency, low-traffic channel for updates and a high-latency, high-traffic channel for assets. It's tough making this work on the open Internet. End to end QoS just isn't widely available. So using much less than the full available bandwidth is needed to keep intermediate FIFO buffers from filling up. This is related to the "bufferbloat" problem - network devices now all have lots of RAM, and if they have FIFO queuing and less output bandwidth than input bandwidth, they will build up huge queues that generate huge latency.

For QoS to work in the wild, there has to be some throttling or incentive to discourage too much high priority traffic. You'd like to have under 5% of your traffic at high priority. It's hard to make this work in the public Internet.


The Second Life client is also free/open source software (LGPLv2) [0]. There are multiple competing forks with significant user bases, the most popular of which is Firestorm [1].

The server software is closed source, but the OpenSimulator project [2] exists as a third-party reimplementation.

[0] http://wiki.secondlife.com/wiki/Linden_Lab_Official:Second_L...

[1] http://www.firestormviewer.org/

[2] http://opensimulator.org/wiki/Main_Page


Nearly all real time games do something similar (though in the past I’ve skipped stuff like message packing). I’m definitely interested in digging through this to see what they’ve done differently. Is there anything unique about SL’s reliable UDP implementation? I don’t believe the same system was carried over for Sansar.


SL uses the same transport protocol between servers and between server and client. SL's virtual world is divided into regions 256 meters on a side. Each region is managed by a separate process. Movement of avatars and vehicles across region boundaries does not work very well, because there are delays and race conditions.

It's supposed to be a consistent-eventually system; the world doesn't stop just because of inter-region movement. So there's temporary inconsistency, resolved correctly most of the time. It's rather ad-hoc; this is 1990s technology. Also, the client is over-involved in region crossing. The client isn't trusted, but it's involved in the handoff of avatar control from one region to another. This was probably a design mistake - the communications rerouting and virtual object movement should have been separate.

Almost everybody else evaded that problem. Sansar has bigger regions (up to 4km on a side, but usually much smaller) and no region crossing at all. It's a simpler client/server architecture. None of the major successors to SL (High Fidelity, Sansar, SineSpace) offer a full connected virtual world. They're more about "experiences" - go in, visit pre-built world, leave. This simplifies their scaling problem.

Second Life is an alternate universe to the World Wide Web. There's much the same functionality - chat, voice, graphics, text - but the approach is completely different. The web is mostly stateless. SL is all about state. Everything is persistent, including the state of executing scripts. So most of the tricks used to scale web applications don't work.


There were certainly decisions made with Sansar to avoid messiness that happened in SL. (You might say the entire existence of Sansar....)

Improbable supposedly offers large large multiserver regions but I don’t know if there is any user generated content platform built on top of it.


> It's tough making this work on the open Internet.

µTP works OK. The trick is to precisely measure latency (these UDP packets have timestamps in them with 1µs resolution), send enough data to achieve good bandwidth, but just enough to not fill these FIFO queues in both endpoints, and in various routers between them.

https://en.wikipedia.org/wiki/Micro_Transport_Protocol


I believe this type of QoS would easily violate “net neutrality”, which is why it hasn’t happened yet.


That is a misconception.

It’s totally legitimate to provide QoS based on open protocols, or even for particular classes of traffic. Net neutrality comes into play where providers start prioritising traffic based on the remote service provider.


Net neutrality governs providers making decisions based on who you are talking to and what you are talking about. It does not govern providers making decisions based on how you are talking (so long as how doesn't mean who or what).

As a separate issue, some providers are regulated (see Comcast) due to other laws. These Title II providers have additional rules that prevent them from making decisions on how. However, if they made such decisions on quality of service (rather than for profit gain, as they have in the past), it would be legal. The problem is that as a monopoly they have no reason to do that.


I've used ENet for my games which is pretty similar to this project. Unfortunately ENet does not support IPv6, and the pull requests on the ENet repository appear to be ignored. The ENet author refuses to accept any changes that break the ABI, which is highly unfortunate.

http://enet.bespin.org/


> GameNetworkingSockets is a basic transport layer for games.

I understand that this is primarily derived from a gaming platform, but is there anything that makes this useful specifically "for game" and not networked applications in general?


Let's start with why TCP won't work: every time a packet is lost or reordered, your on-screen avatar will put Michael Jackson's moonwalk to shame. UDP won't work by itself either because what happens if that packet that said I got the winning kill didn't make it to the server? There are all sorts of time-and-order-critical messages that are needed to correctly keep score in the game for example. So obviously we need 2 channels, some for order sensitive data (like score) and some for last update possible (like movement)


If you round robin on a bunch of tcp connections you can avoid many of the packet loss latency issues (doesn't give an advantage over udp really, but allows you to work in tcp only environments).


Yea if you're gaming on a budget I suppose that would work, but I'd still be worried about the exponential backoff algorithm TCP usages because packets aren't guaranteed to take an unblocked pathway on their subsequent attempts. Its possible to lose a whole bunch of packets before being successfully diverted. And furthermore, what happens if your opponent is scoring points while you're frozen in TCP rectification mode. You're going to need UDP for just about anything that involves players interacting in a 3-dimensional world.


Would be great for TCP to better address the reliable transmission of messages for games, these “reliable UDP” code bases in game engines don’t address all of the other issues such as bandwidth sharing fairness and avoiding saturation of networking links, which ultimately will just make networks slower than faster and more reliable for everyone.


If you changed TCP sufficiently to make it a real-time protocol suitable for gaming, it wouldn't be TCP any more. Reliable streaming and real-time packet delivery are two completely different animals.

I would argue that bandwidth-sharing fairness with stuff going over UDP is easy: Just start dropping packets when pipes get full.

Most updates are going to be pretty small. It's not like game developers want the user experience of their titles to be bad, after all.


It’s not easy to share the bandwidth fairly, it’s taken decades of research and it continues to be improved in TCP.

You’re not consindering a server, the bandwidth is very high on the backend and does saturate links. You run many servers per physical or virtual machine due to cost, so you can have 1000s if players connected over a single network path.


... which is why you provision servers and design your software and network architecture to take the demand (latency, bandwidth, etc.) into account. Data rates for online games are pretty predictable. A 10 Gbit fiber connection to a racked server doesn't cost that much.

At the datacenter level you're making sure that the bandwidth you bought from providers is sufficient (and ideally, redundant), and that you can shift load from one area to another if necessary. You can buy this capability from AWS or Azure, or build it yourself in many different ways.


The penalties in TCP for dropped packets are too great, and there's essentially nothing anyone can do about it. I'm editorializing a little, but to me it's always felt like TCP's main goal was "be a good Internet citizen" to the detriment of basically everything else.

In fairness that was probably a crucial component of the Internet's success, but you don't get the good w/o the bad.

But re: "this will result in bandwidth hogging", routers will just drop your packets if you oversaturate them, so that's not a worry in practice. Well, it is for developers but they use rate scaling algorithms in those cases.


That's an interesting perspective. My personal folk mythology of TCP is that once upon a time, there were terminals connected to mainframes by serial lines, then someone wanted to connect to a mainframe from their minicomputer, so invented a remote login program, which sort of emulated a serial line over the network, then when the ARPAnet came about, someone invented TCP to do remote logins over that. Then, because TCP existed and worked well, people invented millions of other application protocols that worked on TCP's emulated serial line, and got into the habit of thinking about protocols as things which work on emulated serial lines, and so there has never been much demand for anything else.


Hah yeah I can see that. I mean, don't mess w/ success right?


This code implements RFC 5348-style TCP friendly rate limiting [0]. Is your concern about the implementation being wrong, or about the algorithm in 5348 somehow being insufficiently fair?

[0] https://tools.ietf.org/html/rfc5348


Anyone know how this differs from RTP?

I haven’t worked with either but did review the RTP spec, and many of these features appear similar.

RTP also has some multicasting features built in for broadcast delivery, which can be useful in certain contexts.

Edit: the reason I ask is that RTP is getting wide scale deployment and testing as it’s being used for WebRTC.


Also worth comparing to WebRTC data channels, which are SCTP.


Stupid question. How do you solve (de)fragmentation and out of order delivery with unreliable protocol? What are cases and is it possible to make it simple?


You generally only use this for data where only the latest update matters. So if you get "Packet 18" "Packet 30" "Packet 19", you take 18, then 30, then ignore 19. The canonical example is an object's position; usually if you know what something's position is at 30 then information about where it was at 19 is so out of date it's useless.

The gain here is that you typically get the latest information as fast as possible. The downsides, of course, are that there's a pretty steep decline on connections that are even a little high latency or lossy. UDP drops packets even on wired LAN, for example. Practically all games use this method, and they all have lots of smoothing and prediction tech to make things seem like you're getting constant position updates -- which you almost certainly aren't.

You can also use this for data that doesn't matter -- although if it doesn't matter you should question why you're sending it in the first place.


If you need to 'solve' those, you use a reliable protocol instead. Like tcp.

In terms of actual implementation, protocols that does this needs to keep state, regularly notify the other side what they've received, and retry if packets appear to have been lost. Doing it 'simple' is easy, maximising efficiency/performance is harder.

The other answer gives a good example of when you don't actually want to fix the issues you mention.


The moment you say "reliable messages with UDP" you really are saying "we are implementing TCP on our own", which quickly turns into question "why?".

By its own admission, the project says: "The reliability layer is a pretty naive sliding window implementation". A bit scary if you really want to have guarantee message is delivered.

Also scary part: "Our use of OpenSSL is extremely limited; basically just AES encryption", "we do not support x509 certificates". OpenSSL is difficult, coming up with your own key exchange is likely problematic.

So I'm still searching for the answer as to why we need to do this? What's the market for this?

P.S. Yes, Valve being the author certainly brings some "reedeming value" to this!


Your assuming the usage of such a library is sending, let's say, a network crushing amount of data. But, this is really intended for games that try to fit under 512Kb/s (or even less). We're not trying to jam down 20MB/s of JS, 5 MB/s of CSS, and 100MB/s of PNGs. Reliable packets are sending you, at worst, "the state of the world" in a few KB.


You’re just thinking of a single game client, if you have a server running 100 matches then it’s real bandwidth and will saturate network links.


Last time I bought some 10Gbit fiber (some AOC, the stuff with SFPs on both ends) it was about $70 for a 3 meter cable. Amortized cost of switch ports are maybe another $200. You paid $700 for a 10Gbit networking card with (say) four ports ... but a LOT more for the server to run all of this.

Back-of-envelope calculation: You can get about 50,000 client connections at 50 updates/sec of 500 bytes each on a 10Gbit link. Four ports, double redundancy gets you 100K clients on a server. Yike -- that's wayyy more clients than you want on a single server (you almost certainly run out of server-side CPU for game simulation and so forth before you run into bandwidth issues). A dual 40Gbit networking card is pretty cheap, but you'll run into CPU load issues trying to feed that card enough traffic -- it's frankly plenty hard to do that even when you're not doing game computation.

You can probably run all of your servers on 1Gbit copper for under $100 / port. There are better ways to wire things up, but I've done this in the past and it's worked fine.

Capital outlay for sufficient server bandwidth just isn't a big deal.

[edit: back-of-envelope calculation low by a factor of 10 :-) ]


70 dollars for 9ft of fiber with connectors? I almost always make my own cables (fiber included) but for some special jobs where it was a time contraint I'd use pre-made but that still seems very steep. Where are you sourcing cables?

I also suppose not everyone knows how to terminate fiber and I've done it so much it's become second nature.


It might have been closer to $35, I'd have to dig a little. Remember, this is with the SFPs, not just the raw fiber (which is significantly cheaper on its own, even if you buy it terminated).


This happens in TCP too; nothing in TCP is aware of other network connections: it only knows about non-ACKed packets and latency. But the fact that you can saturate a network link is a feature, right? You don't want there to be bandwidth capacity you can't take advantage of. This is an ops/administration issue where you simply don't put more clients on a server than bandwidth capacity will allow.


"Will" not. :) TCP will saturate network links to; can you be more specific? How much are you seeing 100 match boxes egress? A server is almost certainly going to see CPU or memory throughput bounds before egress.


> By its own admission, the project says: "The reliability layer is a pretty naive sliding window implementation". A bit scary if you really want to have guarantee message is delivered.

A "naive sliding window implementation" is the basis of reliable delivery in TCP. I think all they're saying here is that they don't have concept like selective ACKs. But SACKs are basically an optimization, not an issue of correctness.


Given almost any networking problem, TCP/IP is guaranteed to solve it effectively but suboptimally. It simply makes too many assumptions about the needs of the next layer(s) in the protocol stack.

As others have pointed out, things like Nagling and exponential backoff are not wanted or needed in gaming. Any design compromises made in the name of retrying delivery of old data are bound to be counterproductive.


Head of line blocking. Packets go out of date fast in realtime games, better drop one than hold up everything.


I don’t see anything scary here. If a truly secure channel is required — and these are pretty rarely required for games — one might consider TLS instead, and simply accept the additional overhead.

Unless you’re sending executable and/or private content down the pipe, it’s rarely an issue for game networking stacks to eschew secure exchange entirely.


Rarely? I doubt that very much. If you don't encrypt game traffic, people will write packet analyzers that are undetectable by Anticheat, yet still provide some ESP to the cheater.


Isn’t it accurate to say though that encryption is a component of — but not the only property of — secure exchange? Or am I misunderstood?


In general, you are correct, with other components being the proper handling and usage of keys and disposal of pads,

Although if a friend and I met up in a desolate place where he whispers me the secret, I'd argue the exchange was secure without encryption. So if you can exchange a message without anyone else knowing, encryption isn't necessary


I’ve seen a number of companies do this - implementing a custom TCP-like protocol on top of UDP, optimized for the specific use case. I’ve seen a video game company build a library like this (early 2000’s) and I’ve also seen it done in managed file transfer.


"why?" To ship DOTA2




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: