For some reason, I thought this is about the update step in games that happens once per 'tick', that is, the physics engine loop. It's about lossless compression and downloading update packages though. That's also fine with me.
Interesting, but I'd like more details on what's happening at the client.
Take Steam for example. For some games, downloading the update takes seconds, but calculating diffs and extracting takes 10-20 minutes. That's great for Valve, because little bandwidth is used, but terrible at the client side. On top of that, the update process slows the rest of the system almost to a halt, because of all the hard drive activity.
I can potentially see this mechanism making the same mistake.
> On top of that, the update process slows the rest of the system almost to a halt, because of all the hard drive activity
As far as I'm aware, that's a problem only on Linux, because Windows has a desktop-grade IO scheduler tuned to interactive usage (whereas in Linux both the CPU and IO schedulers are written for maximum throughput).
Windows is quite capable of messing this one up as well; if your browser profile is on the same disk as you're doing something heavy, it'll end up blocking the browser while waiting to commit.
Blizzard blew it from the moment they started downloading/syncing their games via BitTorrent - using their customers bandwidth to support games that they paid for.
How exactly did Blizzard "blow it" with this design decision?
Using their customers bandwidth to take off some load of their own patch servers during peak times is (imho) a pretty good use of BitTorrent. You can also quite easily disable this feature in their launcher app.
Good article. As a note, I love how he uses hand drawn diagrams. I have yet to find any tool that allow me to draw diagrams as fast as I can do it on a piece of paper.
I have a Surface, it works reasonably well for the same task. But the same applies - it is not "pretty" svg graphics, but faster and easier. I suppose you could always make the initial diagram this way, and create it in a 2d cad like program if you need to.
One thing that seemed glossed over, so I'm not sure if it's obvious for their use case, is the trade-off between compress once, distribute many times.
When looking at how long it takes to compress vs transmit, the optimisation was done to make the sum of both as small as possible - minimise(time(compress) + time(transmit)).
Instead it seems like you want to do is - minimise(time(compress) + expected_transmissions * time(transmit))
For any reasonable number of distributed copies of a game, it seems like this time to transmit will quickly come to dominate the total time involved.
I suspect, however, that the time to compress grows extremely quickly, for not much gain in compression, so the potential improvement is probably tiny even if you expect to be transmitting to millions of clients.
The rsync example confuses me a little bit. If you add a single bit to the front, then all the bytes are shifted into different blocks and nearly none will hash to match. But if you add a single bit, rsync still performs well. Can someone explain why that difference from the explanation?
The problem also applies to the binary delta. Adding a prefix will shift everything forward causing a diff in everything.
Bsdiff solves this with the suffix sorting. But what does rsync do? Or am I just wrong that rsync still works well? In either case, I think the offset problem makes for a more interesting motivating example for bsdiff.
The rsync algorithm divides the file into fixed size blocks only on the sending side, then calculates checksums for all blocks. On the receiving side, it tries to match them at all offsets, not just multiples of the block size.
Thus, in your example, the first (and possibly the last) block won't be found, but all other blocks will be found, shifted by an offset of 1.
I wish GOG would also open up their client and release it cross platform. Or at least document their protocol, as they promised.