Hacker News new | past | comments | ask | show | jobs | submit login

Wow, that's really impressive. I'll investigate, thank you for the link.



Hey, amos here, main developer of wharf/butler, here's a quick technical summary so you don't have to do the digging yourself:

- File formats are streams of protobuf messages - efficient serialization, easy to parse from a bunch of programming languages. Most files (patches, signatures) are composed of an uncompressed header, and a brotli-compressed stream (in the reference implementation, compression format are pluggable) of other messages.

- The main diff method is based on rsync. It's slightly tuned, in that: it operates over the hashes of all files (which means rename tracking is seamless - the reference implementation detects that and handles it efficiently), and it takes into account partial blocks (at the end of files, smaller than the block size)

- The reference implementation is quite modular Go, which is nice for portability, and, like elisee mentioned, used in production at itch.io. We assume most things are streaming (so that, for example, you can apply a patch while downloading it, no temporary writes to disk needed), we actually use a virtual file system for all downloads and updates.

- The reference implementation contains support for block-based (4MB default) file delivery, which is useful for a verify/heal process (figure out which parts are missing/have been corrupted and correct them)

- The wharf repo contains the basis of a second diff method, based on rsync - for a secondary patch optimization step. The bsdiff algorithm is well-commented with references to the original paper, and there's an opt-in parallel bsdiff codepath (as in multi-core suffix sorting, not just bsdiff operating on chunks)

- A few other companies (including well-known gaming actors) have started reaching out / using parts of wharf for their own usage, I'll happily name names as soon as it's all become more public :)

I'd be happy to answer any questions!


That's quite incredibly thorough. How familiar are you with Blizzard's NGDP protocol / CASC file format?


Not at all, but after a cursory look it seems to solve a slightly different (and easier, imho) problem. I might be mistaken!


It's actually exactly what you described - the documentation is very sparse on it because it's an internal thing (I'm guessing you found the CASC documentation, not the NGDP one). If you're interested, shoot me an email and I can send you some more details; but it'd simply be for intellectual curiosity, as I said it's an internal protocol.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: