This is really cool, but why reinvent the wheel? For instance SQLite already has...

jhardy54 · on May 15, 2020

That's very interesting, thanks for the links. I'm working on Scuttlebutt (like Dat/Hypercore) and have been working on reimplementing our stack with 'boring' tooling like SQLite and HTTP, and I've been really enjoying it so far.

I'm going to read your blog post now, thanks a lot for the new info.

gnu · on May 15, 2020

Interesting! Is there any document that I can read about your reimplementation? Or any code?

black_puppydog · on May 15, 2020

Christian has also put some work into the underlying database and such lately, but the user facing part of that is Oasis [1] which aims to be an ssb interface that has a no-JS UI, with all the logic being handled by the (locally running) nodeJS server.

[1]: https://github.com/fraction/oasis

smaddox · on May 15, 2020

It seems like the blog post answers your question pretty thoroughly. The Hyperdrive index and the protocol are tuned for this use case, making it scale to being able to host a Wikipedia clone. BitTorrent FS + SQlite are not tuned for this use case.

carterschonwald · on May 15, 2020

Wikipedia’s text history absolutely fits on a tiny hard drive and is easy to get a replica of.

RussianCow · on May 15, 2020

Compressed with 7-Zip, sure, but uncompressed, the entire thing takes up 10TB. The Hyperdrive post doesn't mention compression at all, so the comparison should be without it.

> As of June 2015, the dump of all pages with complete edit history in XML format at enwiki dump progress on 20150602 is about 100 GB compressed using 7-Zip, and 10 TB uncompressed.

From: https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia#Si...

It's a bit of a nitpick either way because you're right, Wikipedia may not be the best example because 10TB is still relatively small.

fenwick67 · on May 14, 2020

The big asterisk is, this only works if your database never changes.

rrdharan · on May 15, 2020

Not sure if the parent edited their post after you stated this but note they explained a technique to accommodate database updates / changes / edits.

sktrdie · on May 15, 2020

What do you mean? The author (say wikipedia owners) can change the db as they usually would change (using UPDATE queries say). Those write queries will result in the least-amount of disk-pages updates. In the torrent world this equals a minimum set of pieces modified and needed to be downloaded by users.

watson · on May 15, 2020

Last I checked you can't update a torrent. So if Wikipedia changes even a single letter, you'd need to download all the data once more

sktrdie · on May 15, 2020

No, the pieces you downloaded can be reused for the new torrent download. The pieces will effectively have the same hash hence can be reused for the new digest: http://bittorrent.org/beps/bep_0038.html

This is also why sqlite is a good choice because it's highly optimized to do the least amount of changes to its "pieces" when an update occurs.

If you're implementing this behavior, trying to manage all kinds of different queries, building a querying engine on top of that, optimizing for efficiency and reliability, you're effectively rewriting a database. Sure you can do it, but why not take advantage of battle-tested off-the-shelf stuff for things like "databases" (sqlite) and/or "distributing data" (torrent)?

namibj · on May 15, 2020

Actually, there is a solution against this. Just combine https://www.bittorrent.org/beps/bep_0030.html (Merkle-tree-based hashing) with https://www.bittorrent.org/beps/bep_0039.html (Feed-URL based updates), and in some settings also https://www.bittorrent.org/beps/bep_0047.html (Specifically the padding files, so that flat files inside a torrent can also be efficiently shared in arbitrary combinations of non-partial files.).

black_puppydog · on May 15, 2020

All those BEPs are in "Draft" status. Okay, libtorrent implements two of them. But also, BEP 39 (Updating Torrents Via Feed URL) doesn't really fit very well into the fully distributed setting because of the centralized URL part.

So now to update the torrent file you need a mechanism for having a mutable document you can update in a distributed but signed way. Or you could make an append only feed of sequential torrent urls... oh wait.

My point is: Hyperdrive's scope is sufficiently different from your proposed solution that yes, you could probably rely on existing tools (and I have much love for bittorrent based solutions!) but it starts feeling like shoehorning the problem into a solution that doesn't quite fit.

namibj · on May 17, 2020

The distributed-but-signed way is there in https://www.bittorrent.org/beps/bep_0046.html (Updating Torrents Via DHT Mutable Items).

That draft status is of little practical relevance, though, if nothing changed for years, and no one voiced well-founded critic on the technical details.

I do agree though that Hyperdrive is different from what the bittorrent ecosystem has to offer. I too like not reinventing the wheel where that's not necessary, as you recommend there. I'll leave you the list of BEPs for further reading, in case you're interested: https://www.bittorrent.org/beps/bep_0000.html

black_puppydog · on May 18, 2020

I've been keeping an eye on that list for a long time. There's some really cool stuff in there, and I think bittorrent has really been within reach of being "simply good enough for most applications" for quite some time now. And the massive user base is of course a good thing there, especially if you're talking more about archival projects.

guerrilla · on May 14, 2020

Seems like somone should make a frontend for what you just said, especially that last part which would get annoying to do manually.

nobodywillobsrv · on May 15, 2020

Would sqltorrent setup make sense for sharing scraped/pulled data amongst users. So each user can run the data-extraction themselves or check if anyone has ingress chunks to their liking on the swarm? Everying is append-only content addressable at it's base.

I've been looking around IPFS, dat, hyperdrive etc and it seems like dat is the most natural setting for this but sqltorrent is new to me.

sktrdie · on May 16, 2020

"Sharing data amongst user" - torrents excel at this. Do you have a specific use-case in mind?

xvector · on May 14, 2020

That is pretty clever!

steve76 · on May 15, 2020

Wouldn't seeding be a problem? You would need to seed from something that supports webtorrent which uses WebRTC.

With dat-sdk, users just need to go to a webpage. You really just need WebRTC without torrents.

I got rid of multiwriter by just having a dat archive for each user, and the users sharing their dat addresses with each other. They write to their own. When that happens, events emit and users listening write to theirs.

If enough users stay on, listening to each other's address, I only need a web client.

Also, if I have offline support, like Workbox Background Sync, I don't even need internet and information transfers device to device with just an offline PWA. At least that's my goal.