Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is really cool, but why reinvent the wheel? For instance SQLite already has tons of years of optimization regarding storing and accessing files on disk.

To make SQLite decentralized (like Hyperdrive) you can put in a torrent. Index it using full-text-search https://sqlite.org/fts5.html for instance. Then let the users seed it.

Users can use sqltorrent Virtual File System (https://github.com/bittorrent/sqltorrent) to query the db without downloading the entire torrent - essentially it knows to download only the pieces of the torrent to satisfy the query. This is similar techniques behind Hyperdrive I believe just again, using standard tools and tech that exists and highly optimized: https://www.sqlite.org/vfs.html

Every time a new version of the SQLite db is published (say by wikipedia), the peers can change to the new torrent and reuse the pieces they already have - since SQLite is indexed in an optimal way to reduce file changes (and hence piece changes) when the data is updated.

I talk a bit about it here: https://medium.com/@lmatteis/torrentnet-bd4f6dab15e4

Again not against redoing things better, but why not use existing proven tech for certain parts of the tool?



That's very interesting, thanks for the links. I'm working on Scuttlebutt (like Dat/Hypercore) and have been working on reimplementing our stack with 'boring' tooling like SQLite and HTTP, and I've been really enjoying it so far.

I'm going to read your blog post now, thanks a lot for the new info.


Interesting! Is there any document that I can read about your reimplementation? Or any code?


Christian has also put some work into the underlying database and such lately, but the user facing part of that is Oasis [1] which aims to be an ssb interface that has a no-JS UI, with all the logic being handled by the (locally running) nodeJS server.

[1]: https://github.com/fraction/oasis


It seems like the blog post answers your question pretty thoroughly. The Hyperdrive index and the protocol are tuned for this use case, making it scale to being able to host a Wikipedia clone. BitTorrent FS + SQlite are not tuned for this use case.


Wikipedia’s text history absolutely fits on a tiny hard drive and is easy to get a replica of.


Compressed with 7-Zip, sure, but uncompressed, the entire thing takes up 10TB. The Hyperdrive post doesn't mention compression at all, so the comparison should be without it.

> As of June 2015, the dump of all pages with complete edit history in XML format at enwiki dump progress on 20150602 is about 100 GB compressed using 7-Zip, and 10 TB uncompressed.

From: https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia#Si...

It's a bit of a nitpick either way because you're right, Wikipedia may not be the best example because 10TB is still relatively small.


The big asterisk is, this only works if your database never changes.


Not sure if the parent edited their post after you stated this but note they explained a technique to accommodate database updates / changes / edits.


What do you mean? The author (say wikipedia owners) can change the db as they usually would change (using UPDATE queries say). Those write queries will result in the least-amount of disk-pages updates. In the torrent world this equals a minimum set of pieces modified and needed to be downloaded by users.


Last I checked you can't update a torrent. So if Wikipedia changes even a single letter, you'd need to download all the data once more


No, the pieces you downloaded can be reused for the new torrent download. The pieces will effectively have the same hash hence can be reused for the new digest: http://bittorrent.org/beps/bep_0038.html

This is also why sqlite is a good choice because it's highly optimized to do the least amount of changes to its "pieces" when an update occurs.

If you're implementing this behavior, trying to manage all kinds of different queries, building a querying engine on top of that, optimizing for efficiency and reliability, you're effectively rewriting a database. Sure you can do it, but why not take advantage of battle-tested off-the-shelf stuff for things like "databases" (sqlite) and/or "distributing data" (torrent)?


Actually, there is a solution against this. Just combine https://www.bittorrent.org/beps/bep_0030.html (Merkle-tree-based hashing) with https://www.bittorrent.org/beps/bep_0039.html (Feed-URL based updates), and in some settings also https://www.bittorrent.org/beps/bep_0047.html (Specifically the padding files, so that flat files inside a torrent can also be efficiently shared in arbitrary combinations of non-partial files.).


All those BEPs are in "Draft" status. Okay, libtorrent implements two of them. But also, BEP 39 (Updating Torrents Via Feed URL) doesn't really fit very well into the fully distributed setting because of the centralized URL part.

So now to update the torrent file you need a mechanism for having a mutable document you can update in a distributed but signed way. Or you could make an append only feed of sequential torrent urls... oh wait.

My point is: Hyperdrive's scope is sufficiently different from your proposed solution that yes, you could probably rely on existing tools (and I have much love for bittorrent based solutions!) but it starts feeling like shoehorning the problem into a solution that doesn't quite fit.


The distributed-but-signed way is there in https://www.bittorrent.org/beps/bep_0046.html (Updating Torrents Via DHT Mutable Items).

That draft status is of little practical relevance, though, if nothing changed for years, and no one voiced well-founded critic on the technical details.

I do agree though that Hyperdrive is different from what the bittorrent ecosystem has to offer. I too like not reinventing the wheel where that's not necessary, as you recommend there. I'll leave you the list of BEPs for further reading, in case you're interested: https://www.bittorrent.org/beps/bep_0000.html


I've been keeping an eye on that list for a long time. There's some really cool stuff in there, and I think bittorrent has really been within reach of being "simply good enough for most applications" for quite some time now. And the massive user base is of course a good thing there, especially if you're talking more about archival projects.


Seems like somone should make a frontend for what you just said, especially that last part which would get annoying to do manually.


Would sqltorrent setup make sense for sharing scraped/pulled data amongst users. So each user can run the data-extraction themselves or check if anyone has ingress chunks to their liking on the swarm? Everying is append-only content addressable at it's base.

I've been looking around IPFS, dat, hyperdrive etc and it seems like dat is the most natural setting for this but sqltorrent is new to me.


"Sharing data amongst user" - torrents excel at this. Do you have a specific use-case in mind?


That is pretty clever!


Wouldn't seeding be a problem? You would need to seed from something that supports webtorrent which uses WebRTC.

With dat-sdk, users just need to go to a webpage. You really just need WebRTC without torrents.

I got rid of multiwriter by just having a dat archive for each user, and the users sharing their dat addresses with each other. They write to their own. When that happens, events emit and users listening write to theirs.

If enough users stay on, listening to each other's address, I only need a web client.

Also, if I have offline support, like Workbox Background Sync, I don't even need internet and information transfers device to device with just an offline PWA. At least that's my goal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: