Hacker News new | past | comments | ask | show | jobs | submit login

An instructive thing here is that a lot of stuff has not improved since ~2004 or so, and working around those things that have not improved (memory latency from ram all the way down to l1 cache really) requires fine control of memory layout and minimizing cache pollution, which is difficult to do with all of our popular garbage collected languages, even harder with languages that don't offer memory layout controls, and jits and interpreters add further difficulty.

To get the most out of modern hardware you need to:

* minimize memory usage/hopping to fully leverage the CPU caches

* control data layout in memory to leverage the good throughput you can get when you access data sequentially

* be able to fully utilize multiple cores without too much overhead and with minimal risk of error

For programs to run faster on new hardware, you need to be able to do at least some of those things.




It's pretty remarkable that, for efficient data processing, it's super super important to care about memory layout / cache locality in intimate detail, and this will probably be true until something fundamental changes about our computing model.

Yet somehow this is fairly obscure knowledge unless you're into serious game programming or a similar field.


> Yet somehow this is fairly obscure knowledge unless you're into serious game programming or a similar field.

Because the impact in optimizing hardware like that can be not so important in many applications. Getting the absolute most out of your hardware is very clearly important in game programming, but web apps where scale being served is not huge (vast majority)? Not so much. And in this context developer time is more valuable when you can throw hardware at the problem for less.

Traditional game programming you had to run on the hardware people used to play, you are constrained by the client's abilities. Cloud gaming might(?) be changing some of that, but GPUs are super expensive too compared to the rest of the computing hardware. Even in that case the amounts of data you are pushing you need to be efficient within the context of the GPU, my feeling is it's not easily horizontally scaled.


IMO we are only scratching the surface of cloud gaming so far. Right now it’s pretty much exclusively lift-and-shift, hosted versions of the same game, in many cases running on consumer GPUs. Cloud gaming allows for the development of cloud-native games that are so resource intensive (potentially architected so that more game state is shared across users) that they would not possible to implement on consumer hardware. They could also use new types of GPUs that are designed for more multi tenant use cases. We could even see ASICS developed for individual games!

I think the biggest challenge is that designing these new types of games is going to be extremely hard. Very few people are actually able to design performance intensive applications from the ground up outside of well-scoped paradigms (at least web servers, databases, and desktop games have a lot of prior art and existing tools). Cloud native games have almost no prior art and almost limitless possibilities for how they could be designed and implemented, including as I mentioned even novel hardware.


I've thought about this off and on, and there's certainly interesting possibilities. You can imagine a cloud renderer that does something like a global scatter / photon mapping pass, while each client's session on the front end tier does an independent gather/render. Obviously there's huge problems to making something like this work practically, but just mention it as an example of the sort of more novel directions we should at least consider.


If the "metaverse" ever gets anywhere beyond Make Money Fast and reaches AAA title quality, running the client in "the cloud" may be useful. Mostly because the clients can have more bandwidth to the asset servers. You need more bandwidth to render locally than to render remotely.

The downside is that VR won't work with that much network latency.


TBH I don't think cloud gaming is a long term solution. It might be a medium term solution for people with cheap laptops but eventually the chip in cheap laptops will be able to produce photo realistic graphics and there will be no point going any further than that


Photo realistic graphics ought to be enough for anybody? This seems unlikely, there's so many aspects to graphical immersion that there's still plenty of room for improvement and AAA games will find them. Photo realistic graphics is a rather vague target, it depends on what and how much you're rendering. Then you need to consider that demand grows with supply, with eg. stuff like higher resolutions, even higher refresh rates.


There are diminishing returns. If a laptop could play games at the quality of a top end PC today, would people really want to pay for an external streaming service, deal with latency, etc just so they can get the last 1% of graphical improvements?

We have seen there are so many aspects of computing where once it’s good enough, it’s good enough. Like how onboard DACs got good enough that even the cheap ones are sufficient and the average user would never buy an actual sound card or usb dac. Even though the dedicated one is better, it isn’t that much better.


I think what you're missing is that

1) you still need to install and maintain it and there are many trends even professionally that want to avoid that

2) just cause you could get it many may not want it. I could easily see people settle for a nice M1 MBA or M1 iMac and just stream the games if their internet is fine. Heck, wouldn't it be nicer to play some PC games in the living room like you can do with SteamLink?

3) another comment brings a big point that this unlocks a new "type" of game which can be designed in ways that take advantage of more than a single computer's power to do games with massively shared state that couldn't be reliably done before.

I think to counter my own points: 1) I certainly have a beefy desktop anyways 2) streaming graphics are not even close to local graphics (a huge point) 3) there is absolutely zero way they're gonna steam VR games from a DC to an average residential home within 5 years IMHO.


I think the new macbooks are more a proof that cloud streaming won't be needed. Apple is putting unreal amounts of speed in low power devices. If the M9 Macbook could produce graphics better than the gaming PCs of today, would anyone bother cloud streaming when the built in processing produces a result which is good enough. I'm not sure maintenance really plays much of a part, there is essentially no maintenance of local games since the clients take care of managing it all for you.

Massive shared state might be something which is useful. I have spent some time thinking about it and the only use case I can think of is highly detailed physics simulations with destructible environments in multi player games where synchronization becomes a nightmare traditionally since minor differences cascade in to major changes in the simulation.

But destructible environments and complex physics are a trend which came and went. Even in single player games where its easy, they take too much effort to develop and are simply a gimmick to players which adds only a small amount of value. Everything else seems easier to just pass messages around to synchronize state.


> If a laptop could play games at the quality of a top end PC today, would people really want to pay for an external streaming service, deal with latency, etc just so they can get the last 1% of graphical improvements?

Think of it a different direction: if/when cloud rendering AAA graphics is practical, you can get a very low friction Netflix like experience where you just sit down and go.


IMO the service of netflix is the content library and not the fact it's streaming. If the entire show downloaded before playing, it would only be mildly less convenient than streaming it. But I don't think the streaming adds that much convenience to gaming. If your internet is slow enough that downloading the game beforehand is a pain, then streaming is totally out of the question. And gaming is way way less tolerant of network disruption since you can't buffer anything.

Cloud gaming seemingly only helps in the case when you have weak hardware but want to play AAA games. If we could put "good enough" graphics in every device, there would be no need to stream. And I think in 10 years probably every laptop will have built in graphics that are so good that cloud gaming is more trouble than its worth. It might sound unrealistic to say there is a good enough but I think a lot of things have already reached this point. These days screen DPI is largely good enough, sound quality is good enough, device weight/slimness is good enough, etc.


I'd (gently) say you may be generalizing your own behavior too much. I often just have say 45 minutes to kill and will just browse Netflix to find something to start immediately. Having to wait for a download would send me to something else most likely. Since COVID started, one thing I've heard repeatedly from friends with kids is they manage to carve out an hour or such for a gaming session, sit down, and then have to wait through a mandatory update that ends up killing much of their gaming session. Now add to that the popularity of game pass, and the possibility that "cloud console" offers something similar... there's plenty of people that would love that service imo.


Cloud gaming allows for more shared state and computationally intensive games (beyond just graphics). Maybe eventually clients will easily be able to render 4k with tons of shaders but the state they’re rendering could still be computed remotely. In a way that’s kind of what multiplayer games are like already


Differentiable programming is becoming more and more popular. If there were accurate models of memory/cache behavior, we could predict how code changes would change performance due to CPU behavior, and might be able to get programming tools that can make this way more visible to people who don't know about it. But I have my doubts it will be as easy as I make it sound - and I don't think I make it sound all that easy either :)


I'm always disappointed that no one has come up with a more realistic model for asymptotic time complexity comparisons, one using a computation model with asymptotically increasing memory access times.

It's a pretty sad state of affairs when the main way we talk about algorithm performance suggests that traversing a linked list is as fast as traversing an array.


It is physics that is the cause behind the memory latency. As such, the fundamental aspect of this won't go away ever - you can randomly access small amount of data faster than you can do a much larger amount of data. This is because storing information takes up space, and the speed of light is limited.


The logical conclusion is that most fields have so much data processing capacity relative to the problem size, they don't need to worry about efficient data processing.


Why can’t we abstract over this and optimize per system in runtime?


It’s interesting that L2 cache has basically been steady at 2MB/core since 2004 aswell. It hasn’t changed speed in that time, but is still an order of magnitude faster than memory across that whole timeframe. Does this suggest that the memory speed bottleneck means that there simply hasn’t been a need to increase availability of that faster cache?


Every level of cache strikes a balance between latency & capacity. Bigger caches have higher latency; it's a fundamental property of caches.

What you can conclude is that 0.5MB-2MB and 12-15 cycles of latency has been a steady sweet spot for L2 size for twenty years.

Sidebar: it was a property of caches. 3D assembly may upend the local optima.


Some of these numbers are clearly wrong. Some of the old latency numbers seem somewhat optimistic (e.g. 100 ns main memory ref in 1999), some of the newer ones are pessimistic (e.g. 100 ns main memory ref in 2020). The bandwidth for disks is clearly wrong, as it claims ~1.2 GB/s for a hard drive in 2020. The seek time is also wrong. It crossed 10 ms in 2000 and has reduced to 5 ms in 2010 and is 2 ms for 2020. Seems like linear interpolation to me. It's also unclear what the SSD data is supposed to mean before ~2008 as they were not really a commercial product before then. Also, for 2020 the SSD transfer rate is given as over 20 GB/s. Main memory bandwidth is given as 300+ GB/s.

Cache performance has increased massively. Especially bandwidth, not reflected in a latency chart. Bandwidth and latency are of course related; just transferring a cache line over a PC66 memory bus takes a lot longer than 100 ns. The same transfer on DDR5 takes a nanosecond or so, which leaves almost all of the latency budget for existential latency.

edit: https://github.com/colin-scott/interactive_latencies/blob/ma...

The data on this page is simply extrapolated using formulas and guesses.


The oldest latency numbers were based on actual hardware Google had on hand at the time. Most of them came from microbenchmarks checked into google3. Occasionally an engineer would announce they had found an old parameter tuned for pentium and cranking it up got another 1-2% performance gain on important metrics.

Many of the newer numbers could be based on existing google hardware; for example, Google deployed SSDs in 2008 (custom designed and manufactured even before then) because hard drive latency wasn't getting any better. That opened up a bunch of new opportunities. I worked on a project that wanted to store a bunch of data, Jeff literally came to me with a code that he said "compressed the data enough to justify storing it all on flash, which should help query latency" (this led to a patent!).


Bigger caches could help but as a rule of thumb cache hit rate increases approximately with the square root of cache size, so it diminishes. Then the bigger you make a cache, the slower it tends to be so at some point you could make your system slower by making your cache bigger and slower.


the bigger the cache the longer it takes to address it, and kinda fundamental physics prevents it being faster




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: