As a historical note, there used to be quite a few very popular solutions for supporting early social networks over intermittent protocols.
UUCP [https://en.wikipedia.org/wiki/UUCP] used the computers' modems to dial out to other computers, establishing temporary, point-to-point links between them. Each system in a UUCP network has a list of neighbor systems, with phone numbers, login names and passwords, etc.
FidoNet [https://en.wikipedia.org/wiki/FidoNet] was a very popular alternative to internet in Russia as late as 1990s. It used temporary modem connections to exchange private (email) and public (forum) messages between the BBSes in the network.
In Russia, there was a somewhat eccentric, very outspoken enthusiast of upgrading FidoNet to use web protocols and capabilities. Apparently, he's still active in developing "Fido 2.0": https://github.com/Mithgol
For those who weren't around, Usenet was built on uucp in the early 80s. As messages were store and forward you had to wait a good while for your messages to propagate - many servers only connected daily! Oh, and better set cron to dial in often as messages didn't stay in the spool too long!
Usenet back then was spam free and you could usually end up talking to the creators of whatever you're discussing. I rather miss it.
Quite a few tech companies used private newsgroups for support, so you'd dial into those separately. As they were often techie to techie they worked rather well.
I first came across Usenet and uucp via the Amiga Developer programme. Amicron and uucp overnight all seemed a bit magic back in 87 compared to dialing into non-networked BBS's to browse, very, very slowly!
Or eternal-september. Free accounts, although you don't get access to the binaries groups I don't think. I mostly use it for comp.risks, comp.arch.embedded, and some other things like that.
I'm the one who brought FidoNet to Russia (Soviet Union, to be precise) in 1990. I remember how it was hard to find two more guys with modems and access to automatic international line in order to request a separate FidoNet region for USSR. Finally we got 2:50 region code in September 1990, and there were three of us - two guys from Novosibirsk and one from Yekaterinburg, both are large cities in the Asian part of USSR.
For us raised in Soviet Union, it was eye opening experience that you may freely exchange messages with people around the globe.
It's slightly misleading to refer to Fidonet only in the context of Russia. It was popular in quite a lot of places around the world, not just Russia. Not even principally Russia, in its heyday.
These things are definitely systems to learn from, both their architectures and their histories; and people have already been drawing parallels to Usenet on this very page, notice.
When I said that Fidonet was a very popular alternative to internet in Russia as late as 1990s, I didn't mean that it was limited to Russia, but that in Russia particularly (well, FSU) it was still popular even in late 90s, while elsewhere in the world it was subsumed by internet.
Yep my connection back then was uucp on an Atari ST off 720k floppy no hard drive, through a 1200 baud modem, running a set of a mix of ported GNU utilities and Atari software to get usenet and email. My email address was a bang-path ...
In the very early 90s, my personal computer had a UUCP feed for email and news from a local BBS. I didn't use bang paths on it; the provider had a proper Internet connection. It worked satisfactorily well.
Early email providers in Argentina used Unix to Unix Copy Program (UUCP) to move forward your messages to some of the few internet connected servers. IIRC you would write email in Pegasus or Eudora and then upload/download your mail with a 9600bps modem to a UUCP server.
This sounds like what I wanted from GNU Social when I first joined over a year ago. GNU Social/Mastodon is a fun idea, but it falls apart when you realise that you still don't own your content and it's functionally impossible to switch nodes like it advertised, along with federation being a giant mess.
I tried to switch what server my account was on halfway through my GNU Social life, and you just can't; all your followers are on the old server, all your tweets, and there is no way to say "I'm still the same person". I didnt realise I wanted cryptographic identity and accounts until I tried to actually use the alternative.
That's also part of the interest I have in something like Urbit, which has an identity system centered on public keys forming a web of trust, which also lets you have a reputation system and ban spammers which you can't do easily with a pure DHT.
Not being able to switch nodes pushes you to try and host your own instead. That's what I've done. IMO we should instead be looking at packaging a self-hosted version into a native Windows and Mac app. Run it in the background and everything's done.
This is what I want. I've been wanting to build something like this for a long time. Something where I own my data. I can back it up and if my laptop gets stolen, I just import my data on a new machine and we're good to go.
The challenges that I see:
- Making it easy for any user to get up and running.
- De-authenticating old devices.
- Making it available from any mobile device.
I bought a VPS from CloudAtCost for a one-time fee of $35. Set it up with nginx and GNU Social and pretty much haven't looked back since. My instance is https://kwat.chat.
Ideally, the end solution would be dead simple. Download the Windows app, run it, put your credit card in if you need a URL registered, and it does everything, including daily backups to a folder on your disk.
And then you remember you set up an automated encrypted backup to the cloud and thank your past-self. Now is the time to do it if you're not doing this:
Why, oh why do we have smart individuals replacing internet with the word for the visual analogy we used to represent the internet to not knowing better people in position of power ?
This is not a damn cloud ! This a remote computer you can access over the internet. Can we stop with this use of marketing lingo please ?
Sorry buddy, that ship has already sailed. Terms become popular because they are useful. We all know cloud means "a remote computer accessed over the internet", but that is rather cumbersome phrase. "Cloud" says it in 5 characters. Can you suggest a better term?
Terms become popular because they're useful, but useful to whom?
The purpose of the original coinage of the word "cloud" was to obfuscate that you really meant "someone else's computer". It gives a nice warm, fuzzy decentralised impression - clouds are natural and ubiquitous! No one owns them! If it's in "the cloud" (note the definite article) then it's safe in the very fabric of the network, right?
Nope. It's in Larry and Sergey's basement. Not decentralised at all. Just somewhere else.
The proper term is "server", "datacenter", or "network", depending on what you're actually trying not to say.
If I understand scuttlebut correctly, your stuff is broadcasted to whoever it might concern and the pubs. If you still have your private key, you should still be able to access whatever is in the ether, right ? You somehow become the recipient of your own messages. Only the thief will also have access to your private key, so the account can be considered compromised.
This is a valid point and I don't trust my computer. I would trust, however, a Ledger Wallet http://ledgerwallet.com/ and theoretically and economically it's feasible to have a Ledger wallet app to sign every SSB message. This would be awesome to have.
so, data breach means all your private data is irrevocably publicized.
What percentage of users do you think would be affected by such cases? If it's something over 0.001%, it's a huge problem for a social network.
Sites like Coinbase and Github exist because they re-centralize distributed systems — users don't trust themselves to host their own data securely.
Alternately, if this isn't a problem, why don't users simply host all their own infrastructure for existing tech problems today ?
I'm sure someone capable of living in the Mojave Desert is capable of hosting their own infrastructure - is this network simply for those people, or is it also for journalists, trans people, and HR professionals?
99% of users either don't have backups of their precious photos, or only do because they blindly clicked through "set up iCloud" or their Google/Android or Windows equivalents.
Frankly I trust Google and Facebook more than I do myself with regard to backups. I know it will eventually burn me, but I've lost, misplaced or misplaced the key to my backups more than once.
I'm probably in the minority being so irresponsible with my own backups, but I'm not alone.
Google and Facebook have a lot on the line with regard to user trust of their reliability. Also, they can't monetize data that they've lost.
But goosebook and facegle make backups for their own sake. You're still tied by vendor lock-in and can be locked out of your own data on a whim and prevented from switching to another service provider. They can and do monetize data that users have been locked out of.
I'd rather have my own backup copieS and take my own responsibilities. The scenario you evoke here would not happen if you had proper backup strategy, two is one and one is none.
The 'cloud backup' part could still be extracted away, using integrations with common providers (dropbox, onedrive, gdrive, etc). It would be a configuration step, but one that has pretty obvious benefits to the user so maybe they'd be likely to supply their credentials for that.
Maybe but the person's point stands. Facebook has multiple datacenters with probably some kind of backups. It keeps things for years at a time even when it doesn't need to. It likes to because it helps the business model. Hardly anyone's pics and stuff will disappear.
Compare that to their experience at home with personal gear. Many like the convenience and reliability of Facebook over their own technical skills or efforts. You'd have to convince those people... a shitload of people... that they should start handling IT on their own. Also note that there's many good, smart, interesting, and so on people that simply don't do tech. Anyone filtering non-technical or procrastinating people in a service will be throwing out lots of folks whose company they might otherwise enjoy.
So, these kinds of issues are worth exploring when trying to build a better social network.
It only takes one time of having your account locked by facebook or google and said people are automatically convinced of the obvious advantage of being in control of your own data.
Same with backups, lose your data to drive failure or theft once and suddenly having a backup strategy becomes a priority.
But as long as they have not been bitten once they don't care enough to actually do something proactive.
They usually resolve those with access to their data. Their computers getting trashed by malware or breaking is different. It can cost money to do recovery that might give them nothing. That concern is the more common case.
SSB's central premise -- distributed users, making an ad-hoc network connection whenever they are physically close, or perhaps have some network connection -- bakes in an assumption that a user's ability to connect to the network is sporadic.
It seems like the system would work just as well for people who decide to turn their system off when they go to work, or are on a sailboat. Of course it's not convenient in the same way that always-on social networks are, but that seems to be specifically not the point of SSB.
How about this: you buy a physical device at Wal-Mart for $29.99, plug it in, hook it up to your wifi and leave it plugged into an outlet. It's got Mastodon or GNU Social on it and could look like this, but branded: http://thegadgetflow.com/wp-content/uploads/2015/10/SmartPlu...
And then my internet connection goes down. Power goes out. I run over my data cap for the month. Comcast shuts me down for running a home server. My home network gets DDoS'd. I miss a patch day and I get hacked.
None of these things are a concern on traditional social networks. They have to be solved before the world has any chance of moving to a decentralized network.
It doesn't matter how you feel about it, take a look at people complaining when Google put a news article about Facebook at the top of the results instead of the Facebook login page:
These are people who typed "Facebook Login" into a Google search, clicked the first result without reading, and got confused. Now tell these same users that Comcast blocked their social network or that they can't log in on their phone because their home Internet connection is down.
If you want a social network filled with just people like you and me, look at App.net or GNU Social for inspiration. If you want average users to sign in, these issues absolutely do have to be solved.
"he early web (when I entered and before) wasn't for everyone. And thats OK for me. Actually I think it is a good way to start."
It got where it went by doing the opposite of what you're suggesting. The walled garden for smart elites were mostly working on OSI from what old timers tell me. The TCP/IP, SMTP, etc involved lots of hackers trying to avoid doing too much work. Much like the users you prefer to filter. Then, it just went from there getting bigger and bigger due to low barrier of entry. Tons of economic benefits and business models followed. Now we're talking to each other on it.
The early web was devoid of average users. Then the flow of newcomers and not knowing better users surpassed the old timers and knowledgeable users.
Then we entered a leveling downward race to a web tailored to their needs because they're the large majority.
What do you call traditional social networks ? To me a traditional social network is an AFK thing.
If you're internet connection goes down, power goes out, you get DDoS'D this would hinder your ability to use any third party online service anyways.
The data cap and ISP restrictive terms of service are a different problem that would be challenged and fixed given internet subscriber would go the p2p self host way. The commercial ISP situation is a terrible mess right now.
If you got hacked unplug from network, boot from recovery, restore from backup, you're back online in less time than it takes to recover a hacked facebook account.
You say decentralized but it seems to me you meant distributed here.
It's not that hard for the hypothetical manufacturer to set `git pull` and `apt-get update && apt-get upgrade -y` to run on a cronjob every morning at 4am.
Putting another barrier in front of it is not what's needed for people to use it. Treat it like email (or heck, early Facebook), get big clusters of users in by convincing universities and colleges to run a campus server. Businesses would also be a good idea but a harder sell
Then I have yet another device permanently plugged in and running, at a time where I and frankly all of us should try to reduce our energy consumption.
I'd rather see a universal single consumer server with easy download and plugins for all of this stuff. Host my social network, my mail server, my cloud apps, etc. Basically, make social network a part of OwnCloud and sell OwnCloud boxes. Instead of a million small devices, I do one big one... and "big" can still be RaspPi.
If the power consumption of a RPi for each household with someone like us is a major thing then I say we have come pretty far in reducing waste of energy. :-)
Ah yes, that number is from 2011, published by Google. Can't find the original. But it was widely reported[1].
Assuming that they're doubling energy consumption every year they'd have reached 8GW in 2016. That's 8W per user if we assume 1 billion users. Energy usage of a Raspberry is not insignificant relative to even this.
Doing things at scale is vastly more efficient. And only a subset of Google services can be relegated to a Raspberry. Even if you host your own mails, are you ready to ditch the Google search index and Youtube?
> [...] which also lets you have a reputation system and ban spammers which you can't do easily with a pure DHT.
Sounds interesting. How can you ban spammers when they can just create a new public key/identity if their old one is banned? And also, what does "banning" comprise, in a decentralized social network? I would assume it would be sufficient to just "unfollow" that particular identity.
It seems like you could get a some of the efficiency gains of having lots of people on one node, but avoid the difficulty of moving, by setting up a new node for each person even if they are hosted on the same box. That way you can pick up the whole node with your account and move it, instead of trying to move your account from one node to another.
The Fejoa project (https://fejoa.org) actually targets this problem. It aims to make it possible for users to change their hosting server without losing their contacts.
I mean, my view on the switching nodes thing is. It's not like you can just switch emails. Sure you can install a shim in there to forward everything, but there is no way to actually switch it. And that is the design GNU social uses.
It's a pain point with the entire design of federated services. On the other hand, the pain point with the monolithic services design of Facebook is I can't even talk to people on other services so...
I'd rather be able to email everybody and have it be annoying to switch than be able to only email people on my chosen provider (and then have to make an account on every service anyway).
> However, to get access to the DHT in the first place, you need to connect to a bootstrapping server, such as router.bittorrent.com:6881 or router.utorrent.com:6881
This is a common misunderstanding. You do not need to use those nodes to bootstrap. Most clients simply choose to because it is the most convenient way to do so on the given substrate (the internet). DHTs are in no way limited to specific bootstrap nodes, any node that can be contacted can be used to join the network, the protocol itself is truly distributed.
If the underlying network provides some hop-limited multicast or anycast a DHT could easily bootstrap via such queries. In fact, bittorrent clients already implement multicast neighbor discovery which under some circumstances can result in joining the DHT without any hardcoded bootstrap node.
I think you're being uncharitable in attributing a misunderstanding. The OP used the phrase "in the first place", and it's mostly correct that if you don't have any cached nodes (hence that phrase), the bootstrap nodes do in fact act as a single point of failure for you.
The multicast neighbor discovery is a neat idea. I wonder what percentage of clients/connections it results in successful bootstrapping for.
You can ship a client with a long list of "cached" nodes that were verified to be long-lived. I mean you need to obtain the client at some point, you can gather a fresh list of nodes along with it. From that point onward you keep your own cache fresh.
You could also run your own bootstrap node on an always-up server if downtimes making the lists stale is a concern.
You can also inject contacts when starting the client, you would have to obtain them out-of-band from somewhere of course, but it still does not require anything centralized.
If you're desperate you could also just sweep allocated IPv4 blocks and DHT-ping port 6881, you'll probably find one relatively fast. Of course that doesn't work with v6.
So there is no centralization and no single point of failure.
> The multicast neighbor discovery is a neat idea. I wonder what percentage of clients/connections it results in successful bootstrapping for.
It could work on a college campus, some conference network or occasionally some open wifi. Additionally there are some corporate bittorrent deployments where peer discovery via multicast can make sense.
If I understand TFA correctly scuttlebutt assumes(?) roaming through wifis and LANs. Those circumstances are ideal for multicast bootstrapping, so in principle the DHT can perform just as well as scuttlebutt, probably even better because once it has bootstrapped it can use the global DHT to keep contact with the network even if there is no lan-local peer to be discovered.
> You can ship a client with a long list of "cached" nodes that were verified to be long-lived. I mean you need to obtain the client at some point, you can gather a fresh list of nodes along with it. From that point onward you keep your own cache fresh.
There is no semantic difference between the two. The only difference is when you connect to the single-point-of-truth bootstrap, at download time (well, technically build-time) or at first startup time. And the latter probably gives you a more current, and not limited to long-lived nodes, thus better, answer.
> You could also run your own bootstrap node on an always-up server if downtimes making the lists stale is a concern.
Which itself needs to be bootstrapped. And once it is, it's equivalent to your local cache.
These are excellent ideas. Are any of them implemented? If I download e.g. uTorrent today and firewall off the hardcoded public bootstrap nodes, will it bootstrap?
> If I download e.g. uTorrent today and firewall off the hardcoded public bootstrap nodes, will it bootstrap?
Possibly, which mechanisms are used varies from client to client. Usually DHT bootstrap is not a primary goal but a side-effect of other mechanisms. Things that work in some clients:
magnet -> tracker -> peer -> dht ping
torrent -> tracker -> peer -> dht ping
magnet -> contains direct peer -> peer -> dht ping
torrent or magnet -> multicast discovery -> peer -> dht ping
torrent -> contains a list of dht node ip/port pairs
As you can see all but the last piggyback on regular torrent connections. But that's more because file transfers are the primary purpose and the DHT is not the raison d'etre of those implementations. If DHT connectivity were considered more important clients would also try more direct approaches.
> You can ship a client with a long list of "cached" nodes that were verified to be long-lived. I mean you need to obtain the client at some point, you can gather a fresh list of nodes along with it. From that point onward you keep your own cache fresh.
I believe this is how bitcoin works. Or at least it used to.
to me it always sounds like approaches like dht are the solution but i'm having difficulties diving into it for the purpose of implementing it for my own apps.
are there any noteworthy resources for non-academics to get started?
Well, for an in-depth understanding you will ultimately have to read the academic papers on specific DHT algorithms, but you don't have to be an academic to read academic papers, no? Besides that there are the usual resources for higher-level overview or gleaning some details: wikipedia, protocol specifications, toy implementations on github, stack overflow, various blog posts/articles that can be found via google.
But a DHT is usually just a low-level building block in more complex p2p systems. As its name says it's simply a distributed hash table. A data structure on a network. It just gives you a distributed key-value pair store where the values are often required to be small. In itself it doesn't give you trust, two-way communication, discovery or anything like that. Those are often either tacked on as ad-hoc features, handled by separate protocols or require some tricky cryptography.
Speaking as an academic who studies distributed systems, my advise is to stay away from anything that relies on a public DHT to work correctly. They're vulnerable to node churn, Sybil attacks, and routing attacks.
The last two are particularly devastating. Even if the peers had a key/value whitelist and hashes (e.g. like a .torrent file), an adversary can still insert itself into the routing tables of honest nodes and prevent peers from ever discovering your key/value pairs. Moreover, they can easily spy on everyone who tries to access them. It is estimated [1] that 300,000 of the BitTorrent DHT's nodes are Sybils, for example.
In practice none of those attacks have yet reached a level of concern for bittorrent developers to deploy serious countermeasures. Torrents generally are considered public data, especially those made available through the DHT, and provide peer exchange which allows near-complete extraction of peer lists anyway, so it hardly introduces any new privacy leaks. Although maintaining secrecy while exchanging data over public infrastructure is desirable, that can be achieved by encrypting the payload instead of obscuring the fact that you participated in the network at all.
BEP42[0] has been implemented by many clients and yet nobody has felt the need to actually switching to enforcement mode.
All that is the result of the bittorrent DHT being a low-value target. It does not contain any juicy information and is just one of multiple peer discovery mechanisms, so there's some redundancy too.
> Although maintaining secrecy while exchanging data over public infrastructure is desirable, that can be achieved by encrypting the payload instead of obscuring the fact that you participated in the network at all.
If I'm "in" on the sharing, then I learn the IP addresses (and ISPs and proximate locations) of the other people downloading the shared file. Moreover, if I control the right hash buckets in the DHT's key space, I can learn from routing queries who's looking for the content (even if they haven't begun to share it yet). Encryption alone does not make file-sharing a private affair.
> BEP42[0] has been implemented by many clients and yet nobody has felt the need to actually switching to enforcement mode.
It also does not appear to solve the problem. The attacker only needs to get control of hash buckets to launch routing attacks. Even with a small number of unchanging node IDs, the attacker is still free to insert a pathological sequence of key/value pairs to bump hash buckets from other nodes to them.
> All that is the result of the bittorrent DHT being a low-value target. It does not contain any juicy information and is just one of multiple peer discovery mechanisms, so there's some redundancy too.
Are you suggesting that high-value apps should not rely on a DHT, then?
> Encryption alone does not make file-sharing a private affair.
Someone who is "in" on encrypted content can observe the swarm anyway, thus gains very little from performing snooping on a DHT. On the other hand a passive DHT observer who is not "in" will be hampered by not knowing what content is shared, he only sees participation in opaque hashes. Additionally payload encryption adds deniability because anyone can transfer the ciphertext but participants won't know whether others have the necessary keys to decrypt it.
What I'm saying is that any information leakage via the DHT (compared to public trackers and PEX) is quite small, and this small loss can be more than made up by adding payload encryption.
> the attacker is still free to insert a pathological sequence of key/value pairs to bump hash buckets from other nodes to them.
There is no bumping in kademlia with unbounded node storage. And clients with limited storage can make bumping very hard for others with oldest-first and one-per-subnet policies, i.e. bumping the attackers instead of genuine keys.
> Are you suggesting that high-value apps should not rely on a DHT, then?
No, they should use DHT as a bootstrap mechanism of easy-to-replicate, difficult-to-disrupt small bits of information (e.g. peer contacts as in bittorrent) which then run their own content-specific gossip network for the critical content. In some contexts it can also make sense to make reverse lookups difficult, so attackers won't know what to disrupt unless they're already part of some group.
> Someone who is "in" on encrypted content can observe the swarm anyway, thus gains very little from performing snooping on a DHT.
I can see that this thread is getting specific to Bittorrent, and away from DHTs in general. Regardless, I'm not sure if this is the case. Please correct me if I'm wrong:
* If I can watch requests on even a single copy of a single key/value pair in the DHT, I can learn some of the IP addresses asking for it (and when they ask for it).
* If I can watch requests on all copies of the key/value pair, then I can learn all the interested IP addresses and the times when they ask.
* If I can do this for the key/value pairs that make up a .torrent file, then I can (1) get the entire .torrent file and learn the list of file hashes, and (2) find out the IPs who are interested in the .torrent file.
* If I can then observe any of the key/value pairs for the .torrent file hashes, then I can learn which IPs are interested in and can serve the encrypted data (and the times at which they do so).
This does not strike me as "quite small," but that's semantics.
> There is no bumping in kademlia with unbounded node storage. And clients with limited storage can make bumping very hard for others with oldest-first and one-per-subnet policies, i.e. bumping the attackers instead of genuine keys.
Yes, the DHT nodes can employ heuristics to try to stop this, just like how BEP42 is a heuristic to thwart Sybils. But that's not the same as solving the problem. Applications that need to be reliable have to be aware of these limits, and anticipate them in their design.
> No, they should use DHT as a bootstrap mechanism of easy-to-replicate, difficult-to-disrupt small bits of information (e.g. peer contacts as in bittorrent) which then run their own content-specific gossip network for the critical content. In some contexts it can also make sense to make reverse lookups difficult, so attackers won't know what to disrupt unless they're already part of some group.
This kind of proves my point. You're recommending that applications not rely on DHTs, but instead use their own content-specific gossip network.
To be fair, I'm perfectly okay with using DHTs as one of a family of solutions for addressing one-off or non-critical storage problems (like bootstrapping). But the point I'm trying to make is that they're not good for much else, and developers need to be aware of these limits if they want to use a DHT for anything.
> This does not strike me as "quite small," but that's semantics.
It is quite small because bittorrent needs to use some peer source. If you're not using the DHT you're using a tracker. The same information that can be obtained from the DHT can be obtained from trackers. So there's no novel information leakage introduced by the DHT.
That's why the DHT does not really pose a big information leak.
> This kind of proves my point. You're recommending that applications not rely on DHTs, but instead use their own content-specific gossip network.
That's not what I said. Relying on a DHT for some parts, such as bootstrap and discovery is still... well... relying on it, for things it is good at.
> But the point I'm trying to make is that they're not good for much else, and developers need to be aware of these limits if they want to use a DHT for anything.
Well yes, but these limits arise naturally anyway since A stores data for B on C and you can't really incentivize C to manage anything more than small bits of data.
> I can see that this thread is getting specific to Bittorrent
About DHTs in general, you can easily make reverse lookups difficult or impossible by hashing the keys (bittorrent doesn't because the inputs already are hashes), you can obfuscate lookups by making them somewhat off-target until they're close to the target and making data-lookups and maintenance lookups indistinguishable. You can further add plausible deniability by by replaying recently-seeing lookups when doing maintenance of nearby buckets.
> It is quite small because bittorrent needs to use some peer source. If you're not using the DHT you're using a tracker. The same information that can be obtained from the DHT can be obtained from trackers. So there's no novel information leakage introduced by the DHT.
Replacing a tracker with a DHT trades having one server with all peer and chunk knowledge with N servers with partial peer and chunk knowledge. If the goal is to stop unwanted eavesdroppers, then the choice is between (1) trusting that a single server that knows everything will not divulge information, or (2) trusting that an unknown, dynamic number of servers that anyone can run (including the unwanted eavesdroppers) will not divulge partial information.
The paper I linked up the thread indicates that unwanted eavesdroppers can learn a lot about the peers with choice (2) by exploiting the ways DHTs operate. Heuristics can slow this down, but not stop it. With choice (1), it is possible to fully stop unwanted eavesdroppers if peers can trust the tracker and communicate with it confidentially. There is no such possibility with choice (2) if the eavesdropper can run DHT nodes.
> That's not what I said. Relying on a DHT for some parts, such as bootstrap and discovery is still... well... relying on it, for things it is good at.
> Well yes, but these limits arise naturally anyway since A stores data for B on C and you can't really incentivize C to manage anything more than small bits of data.
Thank you for clarifying. Would you agree that reliable bootstrapping and reliable stead-state behavior are two separate concerns in the application? I'm mainly concerned with the latter; I would never make an application's steady-state behavior dependent on a DHT's ability to keep data available. In addition, bootstrapping information like initial peers and network settings can be obtained through other channels (e.g. DNS servers, user-given configuration, multicasting), which further decreases the need to rely on DHTs.
> About DHTs in general, you can easily make reverse lookups difficult or impossible by hashing the keys (bittorrent doesn't because the inputs already are hashes), you can obfuscate lookups by making them somewhat off-target until they're close to the target and making data-lookups and maintenance lookups indistinguishable. You can further add plausible deniability by by replaying recently-seeing lookups when doing maintenance of nearby buckets.
I'm not quite sure what you're saying here, but it sounds like you're saying that a peer can obfuscate lookups by adding "noise" (e.g. doing additional, unnecessary lookups). If so, then my reply would be this only increases the number of samples an eavesdropper needs to make to unmask a peer. To truly stop an eavesdropper, a peer needs to ensure that queries are uniformly distributed in both space and time. This would significantly slow down the peer's queries and consume a lot of network bandwidth, but it would stop the eavesdropper. I don't know of any production system that does this.
> If the goal is to stop unwanted eavesdroppers, then the choice is between (1) trusting that a single server that knows everything will not divulge informatio
In practice trackers do divulge all the same information that can be gleaned from the DHT and so does PEX in a bittorrent swarm. Those are far more convenient to harvest.
> I'm not quite sure what you're saying here, but it sounds like you're saying that a peer can obfuscate lookups by adding "noise" (e.g. doing additional, unnecessary lookups).
That's only 2 of 4 measures I have listed. And I would mention encryption again as a 5th. The others: a) Opportunistically creating decoys by having others repeat lookups they have recently seen as part of their routing table maintenance b) storing data in the DHT in a way that requires some prior knowledge to be useful, which will ideally result in the only leaking information when the listener could obtain the information anyway if he has that prior knowledge.
There's a lot you can do to harden DHTs. I agree that naive implementations are trivial to attack, but to my knowledge it is possible to achieve byzantine fault tolerance in a DHT in principle, it's just that nobody has actually needed that level of defense yet, attacks in the wild tend to be fairly primitive and only succeed because some implementations are very sloppy about sanitizing things.
> To truly stop an eavesdropper, a peer needs to ensure that queries are uniformly distributed in both space and time.
Not quite. You only need to increase the number of samples needed beyond the number of samples a peer is likely to generate during some lifecycle, and that is not just done by adding more traffic.
> Would you agree that reliable bootstrapping and reliable stead-state behavior are two separate concerns in the application?
Certainly, but bootstrapping is a task that you do more frequently than you think. You don't just join a global overlay once, you also (re)join many sub-networks throughout each session or look for specific nodes. DHT is a bit like DNS. You only need it once a day for a domain (assuming long TTLs), and it's not exactly the most secure protocol and afterwards you do the heavy authentication lifting with TLS, but DNS is still important, even if it you're not spending lots of traffic on it.
> In practice trackers do divulge all the same information that can be gleaned from the DHT and so does PEX in a bittorrent swarm. Those are far more convenient to harvest.
I'm confused. I can configure a tracker to only communicate with trusted peers, and do so over a confidential channel. The tracker is assumed to not leak peer information to external parties. A DHT can do neither of these.
> That's only 2 of 4 measures I have listed. And I would mention encryption again as a 5th. The others: a) Opportunistically creating decoys by having others repeat lookups they have recently seen as part of their routing table maintenance b) storing data in the DHT in a way that requires some prior knowledge to be useful, which will ideally result in the only leaking information when the listener could obtain the information anyway if he has that prior knowledge.
Unless the externally-observed schedule of key/value requests is statistically random in time and space, the eavesdropper can learn with better-than-random guessing which peers ask for which chunks. Neither (a) nor (b) address this; they simply increase the number of samples required.
> There's a lot you can do to harden DHTs. I agree that naive implementations are trivial to attack, but to my knowledge it is possible to achieve byzantine fault tolerance in a DHT in principle, it's just that nobody has actually needed that level of defense yet, attacks in the wild tend to be fairly primitive and only succeed because some implementations are very sloppy about sanitizing things.
First, no system can tolerate Byzantine faults if over a third of its nodes are hostile. If I can Sybil a DHT, then I can spin up arbitrarily many evil nodes. Are we assuming that no more than one third of the DHT's nodes are evil?
Second, "nobody has actually needed that level of defense yet" does not mean that it is a sound decision for an application to use a DHT with the expectation that the problems will never occur. So the maxim goes, "it isn't a problem, until it is." As an application developer, I want to be prepared for what happens when it is a problem, especially since the problems are known to exist and feasible to exacerbate.
> Not quite. You only need to increase the number of samples needed beyond the number of samples a peer is likely to generate during some lifecycle, and that is not just done by adding more traffic.
I'm assuming that peers are arbitrarily long-lived. Real-world distributed systems like BitTorrent and Bitcoin aspire to this.
> Certainly, but bootstrapping is a task that you do more frequently than you think. You don't just join a global overlay once, you also (re)join many sub-networks throughout each session or look for specific nodes. DHT is a bit like DNS. You only need it once a day for a domain (assuming long TTLs), and it's not exactly the most secure protocol and afterwards you do the heavy authentication lifting with TLS, but DNS is still important, even if it you're not spending lots of traffic on it.
I take issue with saying that "DHTs are like DNS", because they offer fundamentally different data consistency guarantees and availability guarantees (even Beehive (DNS over DHTs) is vulnerable to DHT attacks that do not affect DNS).
Regardless, I'm okay with using a DHT as one of many supported bootstrapping mechanisms. I'm not okay with using it as the sole mechanism or even the primary mechanism, since they're so easy to break when compared to other mechanisms.
> I'm confused. I can configure a tracker to only communicate with trusted peers, and do so over a confidential channel. The tracker is assumed to not leak peer information to external parties. A DHT can do neither of these.
But then you are running a private tracker for personal/closed group use and have a trust source. If you have a trust source you could also run a closed DHT. But the bittorrent DHT is public infrastructure and best compared to public trackers.
> I'm assuming that peers are arbitrarily long-lived. Real-world distributed systems like BitTorrent and Bitcoin aspire to this.
Physical machines are. Their identities (node IDs, IP addresses) and the content they participate in at any given time don't need to be.
> If I can Sybil a DHT, then I can spin up arbitrarily many evil nodes.
This can be made costly. In the extreme case you could require a bitcoin-like proof of work system for node identities. But that would be wasteful... unless you're running some coin network anyway, then you can tie your ID generation to that. In lower-value targets IP prefixes tend to be costly enough to thwart attackers. If an attacker can muster the resources to beat that he would also have enough unique machines at his disposal to perform a DoS on more centralized things.
> Are we assuming that no more than one third of the DHT's nodes are evil?
Assuming is the wrong word. I think approaching BFT is simply part of what you do to harden a DHT against attackers.
> Second, "nobody has actually needed that level of defense yet" does not mean that it is a sound decision for an application to use a DHT with the expectation that the problems will never occur.
I haven't said that. I'm saying that simply because this kind of defense was not yet needed nobody tried to build it, as simple as that. Sophisticated security comes with implementation complexity, that's why we had HTTP for ages before HTTPS adoption was spurred by the snowden leaks.
> Neither (a) nor (b) address this; they simply increase the number of samples required.
(b) is orthogonal to sampling vs. noise.
> I'm not okay with using it as the sole mechanism or even the primary mechanism, since they're so easy to break when compared to other mechanisms.
What other mechanisms do you have in mind? Most that I am aware of don't offer the same O(log n) node-state and lookup complexity in a distributed manner.
> But then you are running a private tracker for personal/closed group use and have a trust source. If you have a trust source you could also run a closed DHT. But the bittorrent DHT is public infrastructure and best compared to public trackers.
You're ignoring the fact that with a public DHT, the eavesdropper has the power to reroute requests through networks (s)he can already watch. With a public tracker, the eavesdropper needs vantage points in the tracker's network to gain the same insights.
If we're going to do an apples-to-apples comparison between a public tracker and a public DHT, then I'd argue that they are equivalent only if:
(1) the eavesdropper cannot add or remove nodes in the DHT;
(2) the eavesdropper cannot influence other nodes' routing tables in a non-random way.
> This can be made costly. In the extreme case you could require a bitcoin-like proof of work system for node identities. But that would be wasteful... unless you're running some coin network anyway, then you can tie your ID generation to that. In lower-value targets IP prefixes tend to be costly enough to thwart attackers. If an attacker can muster the resources to beat that he would also have enough unique machines at his disposal to perform a DoS on more centralized things.
Funny you should mention this. At the company I work part-time for (blockstack.org), we thought of doing this very thing back when the system still used a DHT for storing routing information.
We had the additional advantage of having a content whitelist: each DHT key was the hash of its value, and each key was written to the blockchain. Blockstack ensured that each node calculated the same whitelist. This meant that inserting a key/value pair required a transaction, and the number of key/value pairs could grow no faster than the blockchain.
This was not enough to address data availability problems. First, the attacker would still have the power to push hash buckets onto attacker-controlled nodes (it would just be expensive). Second, the attacker could still join the DHT and censor individual routes by inserting itself as neighbors of the target key/value pair replicas.
The best solution we came up with was one whereby DHT node IDs would be derived from block headers (e.g. deterministic but unpredictable), and registering a new DHT node would require an expensive transaction with an ongoing proof-of-burn to keep it. In addition, our solution would have required that every K blocks, the DHT nodes would deterministically re-shuffled their hash buckets among themselves in order to throw off any encroaching routing attacks.
We ultimately did not do this, however, because having the set of whitelisted keys growing at a fixed rate afforded a much more reliable solution: have each node host a 100% replica of the routing information, and have nodes arrange themselves into a K-regular graph where each node selects neighbors via a random walk and replicates missing routing information in rarest-first order. We have published details on this here: https://blog.blockstack.org/blockstack-core-v0-14-0-release-....
> Assuming is the wrong word. I think approaching BFT is simply part of what you do to harden a DHT against attackers.
If you go for BFT, you have to assume that no more than f of 3f+1 nodes are faulty. Otherwise, the malicious nodes will always be able to prevent the honest nodes from reaching agreement.
> I haven't said that. I'm saying that simply because this kind of defense was not yet needed nobody tried to build it, as simple as that. Sophisticated security comes with implementation complexity, that's why we had HTTP for ages before HTTPS adoption was spurred by the snowden leaks.
Right. HTTP's lack of security wasn't considered a problem, until it was. Websites addressed this by rolling out HTTPS in droves. I'm saying that in the distributed systems space, DHTs are the new HTTP.
> What other mechanisms do you have in mind? Most that I am aware of don't offer the same O(log n) node-state and lookup complexity in a distributed manner.
How about an ensemble of bootstrapping mechanisms?
* give the node a set of initial hard-coded neighbors, and maintain those neighbors yourself.
* have the node connect to an IRC channel you maintain and ask an IRC bot for some initial neighbors.
* have the node request a signed file from one of a set of mirrors that contains a list of neighbors.
* run a DNS server that lists currently known-healthy neighbors.
* maintain a global public node directory and ship it with the node download.
> You're ignoring the fact that with a public DHT, the eavesdropper has the power to reroute requests through networks (s)he can already watch.
But in the context of bittorrent that is not necessary if we're still talking about information leakage. The tracker + pex gives you the same, and more, information than watching the DHT.
> we thought of doing this very thing back when the system still used a DHT for storing routing information.
The approaches you list seem quite reasonable when you have a PoW system at your disposal.
> have each node host a 100% replica of the routing information, and have nodes arrange themselves into a K-regular graph
This is usually considered too expensive in the context of non-coin/-blockchain p2p networks because you want nodes to be able to run on embedded and other resource-constrained devices. The O(log n) node state and bootstrap cost limits are quite important. Otherwise it would be akin to asking every mobile phone to keep up to date with the full BGP route set.
> assume that no more than f of 3f+1 nodes are faulty. Otherwise, the malicious nodes will always be able to prevent the honest nodes from reaching agreement.
Of course, but for some applications that is more than good enough. If your adversary can bring enough resources to bear to take over 1/3rd of your network he might as well DoS any target he wants. So you would be facing massive disruption anyway. I mean blockchains lose some of their security guarantees too once someone manages to dominate 1/2 of the mining capacity. Same order of magnitude. It's basically the design domain "secure, up to point X".
> I'm saying that in the distributed systems space, DHTs are the new HTTP.
I can agree with that, but I think the S can be tacked on once people feel the need.
> How about an ensemble of bootstrapping mechanisms?
The things you list don't really replace the purpose of a DHT. A dht is a key-value store for many keys and a routing algorithm to find them in a distributed environment. What you listed just gives you a bunch of nodes, but no data lookup capabilities. Essentially you're listing things that could be used to bootstrap into a DHT, not replacing the next layer services provided by a DHT.
> This is usually considered too expensive in the context of non-coin/-blockchain p2p networks because you want nodes to be able to run on embedded and other resource-constrained devices. The O(log n) node state and bootstrap cost limits are quite important. Otherwise it would be akin to asking every mobile phone to keep up to date with the full BGP route set.
Funny you should mention BGP. We have been approached by researchers at Princeton who are interested in doing something like that, using Blockstack (but to be fair, they're more interested in giving each home router a copy of the global BGP state).
I totally hear you regarding the costly bootstrapping. In Blockstack, for example, we expect most nodes to sync up using a recent signed snapshot of the node state and then use SPV headers to download the most recent transactions. It's a difference between minutes and days for booting up.
> Of course, but for some applications that is more than good enough. If your adversary can bring enough resources to bear to take over 1/3rd of your network he might as well DoS any target he wants. So you would be facing massive disruption anyway.
Yes. The reason I brought this up is that in the context of public DHTs, it's feasible for someone to run many Sybil nodes. There's some very recent work out of MIT for achieving BFT consensus in open-membership systems, if you're interested: https://arxiv.org/pdf/1607.01341.pdf
> I mean blockchains lose some of their security guarantees too once someone manages to dominate 1/2 of the mining capacity. Same order of magnitude. It's basically the design domain "secure, up to point X".
In Bitcoin specifically, the threshold for tolerating Byzantine miners is 25% hash power. This was one of the more subtle findings from Eyal and Sirer's selfish mining paper.
> The things you list don't really replace the purpose of a DHT. A dht is a key-value store for many keys and a routing algorithm to find them in a distributed environment. What you listed just gives you a bunch of nodes, but no data lookup capabilities. Essentially you're listing things that could be used to bootstrap into a DHT, not replacing the next layer services provided by a DHT.
If the p2p application's steady-state behavior is to run its own overlay network and use the DHT only for bootstrapping, then DHT dependency can be removed simply by using the systems that bootstrap the DHT in order to bootstrap the application. Why use a middle-man when you don't have to?
> If the p2p application's steady-state behavior is to run its own overlay network and use the DHT only for bootstrapping, then DHT dependency can be removed simply by using the systems that bootstrap the DHT in order to bootstrap the application. Why use a middle-man when you don't have to?
It seems like we have a quite different understanding how DHTs are used, probably shaped by different use-cases. Let me see if I can summarize yours correctly: a) over time nodes will be interested or have visited in a large proportion of the keyspace b) it makes sense to eventually replicate the whole dataset c) the data mutation rate is relatively low d) access to the keyspace is extremely biased, there is some subset of keys that almost all nodes will access. Is that about right?
In my case this is very different. Node turnover is high (mean life time <24h), data is volatile (mean lifetime <2 hours), nodes are only ever interested in a tiny fraction of the keyspace (<0.1%), nodes access random subsets of the keyspace, so there's little overlap in their behavior. The data would become largely obsolete before you even replicated half the DHT unless you spent a lot of overhead on keeping up with hundreds of megabytes of churn per hour and you would never use most of it.
So for you there's just "bootstrap dataset" and then "expend a little effort to keep the whole replica fresh". For me there's really "bootstrap into the dht", "maintain (tiny) routing table" and then "read/write random access to volatile data on demand, many times a day".
This is why the solutions you propose are no solutions for a general DHT which can also cope with high churn.
> It seems like we have a quite different understanding how DHTs are used, probably shaped by different use-cases. Let me see if I can summarize yours correctly: a) over time nodes will be interested or have visited in a large proportion of the keyspace b) it makes sense to eventually replicate the whole dataset c) the data mutation rate is relatively low d) access to the keyspace is extremely biased, there is some subset of keys that almost all nodes will access. Is that about right?
Agreed on (a), (b), and (c). In (a), the entire keyspace will be visited by each node, since they have to index the underlying blockchain in order to reach consensus on the state of the system (i.e. each Blockstack node is a replicated state machine, and the blockchain encodes the sequence of state-transitions each node must make). (d) is probably correct, but I don't have data to back it up (e.g. because of (b), a locally-running application node accesses its locally-hosted Blockstack data, so we don't ever see read accesses).
> In my case this is very different. Node turnover is high (mean life time <24h), data is volatile (mean lifetime <2 hours), nodes are only ever interested in a tiny fraction of the keyspace (<0.1%), nodes access random subsets of the keyspace, so there's little overlap in their behavior. The data would become largely obsolete before you even replicated half the DHT unless you spent a lot of overhead on keeping up with hundreds of megabytes of churn per hour and you would never use most of it.
Thank you for clarifying. Can you further characterize the distribution of reads writes over the keyspace in your use-case? (Not sure if you're referring to the Bittorrent DHT behavior in your description, so apologies if these questions are redundant). For example:
* Are there a few keys that are really popular, or are keys equally likely to be read?
* Do nodes usually read their own keys, or do they usually read other nodes' keys?
* Is your DHT content-addressable (e.g. a key is the hash of its value)? If so, how do other nodes discover the keys they want to read?
* If your DHT is not content-addressable, how do you deal with inconsistent writes during a partition? More importantly, how do you know the value given back by a remote node is the "right" value for the key?
> Not sure if you're referring to the Bittorrent DHT
I am, but that's not even that important because storing a blockchain history is a very special usecase because you're dealing with an append-only data structure. There are no deletes or random writes. Any DHT used for p2p chat, file sharing or some mapping of identity -> network address will experience more write-heavy, random access workloads.
> Are there a few keys that are really popular, or are keys equally likely to be read?
Yes, some are more popular than others, but the bias is not strong compared to the overall size of the network. 8M+ nodes. Key popularity may range from 1 to maybe 20k. And such peaks are transient, mostly for new content.
> Do nodes usually read their own keys, or do they usually read other nodes' keys?
It is extremely unlikely that nodes are interested in the data for which they provide storage.
> Is your DHT content-addressable (e.g. a key is the hash of its value)?
Yes and no, it depends on the remote procedure call used. Generic immutable get/put operations are. Mutable ones use the hash of the pubkey. Peer address list lookups use the hash of an external value (from the torrent).
> * If your DHT is not content-addressable, how do you deal with inconsistent writes during a partition? More importantly, how do you know the value given back by a remote node is the "right" value for the key?
For peer lists it maintains a list of different values from multiple originators, the value is the originator's IP, so it can't be easily spoofed (3-way handshake for writes). A store adds a single value, a get returns a list.
For mutable stores the value -> signature -> pubkey -> dht key is checked.
S/Kademlia does not solve this problem; it simply slows down the rate at which an adversary can attack the system by a small amount (i.e. by making node ID creation more expensive and increasing a key/value pair's number of replicas).
There are several DHT papers that talk about bootstrapping DHTs off of social networks. They all fail to solve the Sybil problem in the same way: an adversary simply attacks the social network by pretending to be many people.
Not everything needs a global singleton like a blockchain or DHT or a DNS system. Bitcoin needs this because of the double-spend problem. But private chats and other such activities don't.
I have been working on this problem since 2011. I can tell you that peer-to-peer is fine for asynchronous feeds that form tree based activities, which is quite a lot of things.
But everyday group activities usually require some central authority for that group, at least for the ordering of messages. A "group" can be as small as a chess game or one chat message and its replies. But we haven't solved mental poker well for N people yet. (Correct me if I am wrong.)
The goal isn't to not trust anyone for anything. After all, you still trust the user agent app on your device. The goal is to control where your data lives, and not have to rely on any particular connections to eg the global internet, to communicate.
Btw ironic that the article ends "If you liked this article, consider sharing (tweeting) it to your followers". In the feudal digital world we live in today, most people speak must speak a mere 140 characters to "their" followers via a centralized social network with huge datacenters whose engineers post on highscalability.com .
If you are interested, here I talk about it further in depth:
I have been researching along these same lines for a while now as well, ad-hoc/mesh network messaging. My use case would be an amateur radio mesh network. For a while, I was investigating running matrix.org servers on raspberry pis, connected to a mesh network without internet. And that does work, the closest I've come to a great solution.
But I had never heard of scuttlebut until now. This looks even more ideal. In amateur radio, everyone self identifies with their call sign, this follows the same model.
For amateur radio, there is a restriction against encryption (intent to obscure or hide the message), but the public messages would be fine. Private messages (being encrypted for only those the right keys) might be a legal issue, so for a legit amateur radio deployment, the client would have to disable that (or at least operators would have to be educated that private messages may violate fcc rules).
> And I predict, in the next 5-7 years we're going to see a lot more power to the people (...) through decentralized social networking tools / platforms that can run in new types of topologies.
How do you see this happening in such a relative short amount of time? Who (else) is going to do this? Is our culture predisposed to do this, and, if not, is there a strategy to overcome this culture factor?
My friends and I have thought this through in detail a while ago, and have a few suggestions to make. I hope you make the best of it!
Distributed identity
Allow me to designate trusted friends / custodians. Store fractions of my private key with them, so that they can rebuild the key if I lost mine. They should also be able to issue a "revocation as of certain date" if my key is compromised, and vouch for my new key being a valid replacement of the old key. So my identity becomes "Bob Smith from Seattle, friend of Jane Doe from Portland and Sally X from Redmond". My social circle is my identity! Non-technical users will not even need to know what private key / public key is.
Relays
Introduce a notion of the "relay" server - a server where I will register my current IP address for direct p2p connection, or pick my "voicemail" if I can't be reach right away. I can have multiple relays. So my list of friends is a list of their public keys and their relays as best I know them. Whenever I publish new content, the software will aggressively push the data to each of my friends / subscribers. Each time my relay list is updated, it also gets pushed to everyone. If I can't find my friend's relay, I will query our mutual friends to see if they know where to find my lost friend.
Objects
There should be a way to create handles for real-life objects and locations. Since many people will end up creating different entries for the same object, there should be a way for me to record in my log that guid-a and guid-b refer to the same restaurant in my opinion. As well I could access similar opinion records made by my friends, or their friends.
Comments
Each post has an identity, as does each location. My friends can comment on those things in their own log, but I will only see these comments if I get to access those posts / locations myself (or I go out of my way to look for them). This way I know what my friends think of this article or this restaurant. Bye-bye Yelp, bye-bye fake Amazon reviews.
Content Curation
I will subscribe to certain bots / people who will tell me that some pieces of news floating around will be a waste of my time or be offensive. Bye-bye clickbait, bye-bye goatse.
Storage
Allow me to designate space to store my friend's encrypted blobs for them. They can back up their files to me, and I can backup to them.
Right now I'm particularly interested in https://github.com/solid/web-access-control-spec although I think it's incomplete when it comes to data portability and access control. From what I've seen on re-decentralizing the internet, access control is either non-existent, or relies on a server hosting your data to implement access control correctly.
What if, in the WAC protocol linked above, instead of ACL resources informing the server, we could have ACL resources providing clients with keys to the encrypted resource (presumably wrapped in each authorized agent's pub key). Host proof data is a necessity for decentralized social networking IMO, even if the majority of agents would happily hand their keys over to their host.
For relays, that is more or less what the pub servers do. I connect to a relay, and if I subscribe to a channel or follow an individual, I end up with messages from months ago. The pub servers "gossip" with each other, so any particular pub server you connect to, should be able to catch you up on all of your friends and channels "gossip".
For the distributed identity piece is there a good reason not to rely on keybase.io?
Also important that an initial smaller community would be targeted and that it would succeed there. FB did this will colleges, a federated one would in a world where FB already exists would have an even harder time.
The Keybase server manages giving out usernames, and recording the proof URLs for users, and then your client hits the URLs, checks that the proofs are signed with the appropriate key, and caches them to watch for future discrepancies.
Keybase offers decentralized trust, in that the Keybase server can't lie to you about someone's keys -- your Keybase client will trust their public proofs and not the Keybase server -- but it's not a distributed/decentralized service as a whole, because you still receive hints from the server about where proofs live, and learn Keybase usernames from it.
(Speaking personally, not sure what an official Keybase opinion would be.)
No, I don't think the tech is quite there yet. Even just handing out human-readable usernames requires blockchain-style consensus, and we don't have a blockchain being followed along by everyone's machines to adjudicate consensus requests (yet!).
The folks at Blockstack Labs are doing fine work in this area, though: https://blockstack.org/
You did not answer my question, but I guess it does not matter with the new strategy. Looks like embrace-extend-extinguish to me – has it always been a goal of Keybase to replace PGP with something keybase-specific or did something change?
What the heck is keybase ? I went to the website and it does not offer a clue about what this is or how it works. It says "download the app" but is not an app. It says it's more than a website but does not seem to be distributed in any way.
The fact that it failed at the most basic thing of actually telling what it is about, what it does and how would be good reasons to not use keybase.
All I got back was "An error occured (sic) while attempting to redeem invite. could not connect to sbot"
It worked with http://pub.locksmithdon.net/ though I feel a bit odd trusting a "locksmith" I've never heard of to stream lots of data to my harddrive...
It's cool that anyone can host a pub – basically, an instance of FB/Twitter/Gmail, it seems – but things 1) will get expensive for them, and it's unclear how they'll fund that – and 2) now I have to trust random people on the internet – not only to be nice, but also secure.
As a "random technically aware netizen", I honestly trust fooplesoft more, since they have a multi-billion-dollar reputation to protect. (Not that I trust fooplesoft).
These pubs you mentioned are suffering under the large amount of traffic generated by HN and they were not designed for this load. Ideally hosting your own pub should be as easy as possible. My goal is to have it possible under a Heroku "Click to deploy" button or Zeit `now staltz/easy-ssb-pub` so that we can have more pubs. By the way, my pubs are public just because I chose to, but I may take that down if I want. No data would be destroyed, since you'd have all that locally and you can connect to any other pub and replicate through that.
All pubs on the wiki are indeed overloaded. Interestingly if one sets up their own, the other pubs eventually sync with it, only the desktop client seems to be unhappy with laggy pubs. Is that by design?
FWIW, you can use pub.lua.cz:8008:@xYSW6eVu8gTS/nTSXZiH97dgKZ+wp7NkomR6WKK/PBI=.ed25519~iQ16RuvjKZqy/RhiXXmW9+6wuZNq+SBI8evG3PotxvI= if you have trouble connecting to the ones on github.
Feel free to add it to the wiki, I do plan to run it long term, but I am not a github user.
You don't need to trust the security of pubs. Validation of messages happens through cryptographic signing and public messages are public anyways. You also don't need to trust that pubs will be online much because your followers will also help host your content.
> So, if I post a GB worth of diary entries, who ends up caching it by default? Just my followers? People I follow? Pubs I connect to?
your followers, their followers, and their followers (assuming everyone is using the default replication settings). These may include pubs or people you follow. If you are able to connect to a pub then most likely it is willing to replicate your feed.
Why do all "social networks" have to be a feed of news? Couldn't anyone think of anything better than a system in which people are only encouraged to talk about themselves and try to get other people's approval? In which having more "friends" is always better, because you have more potential for self-agrandissement in your narcissistic posts?
With SSB, you could make a UI that renders the content in a different way, such as a HN-style thing, and use it on the existing network alongside people using other clients.
The social aspect is important though because in this architecture what you see is determined by who you follow (and who they follow, etc.)
If you want a "modern web" example, nntpchan - despite the name, its not related to usenet directly, only uses the same method of federated pub/sub replication.
presenting social content in a different format would be interesting, but I've not seen any compelling options. But the issues of looking for approval and social status are social issues - not sure how a social network tool could avoid that. Isn't that what a lot of people's casual relationships are like? What do most people do at parties and social events in person - they talk about themselves and each other...
but hacker news isn't really about social content. Some of that might creep in for some users, and yes the karma thing does pollute motivations for some, but I've never thought of it as a social network. Some networking is done here, but only indirectly I think.
Maybe I'm not thinking about it right or use it differently than most :P
Yeah, it is not a "complete" social network, I was just using it as an example that there are places in which people aren't so narcissistic as in Facebook.
People are narcissistic by design. Social Networks were created to harness that narcissism to make money. I argue that there are a lot of other 'social networks' on the internet in a technical sense. You had profiles and added friends and then sent things to them/chatted with them on AIM, LiveJournal, any of a number of message boards, blogs/comments, etc.
The deciding factor between what came before and facebook and twitter is the ability to broadcast to the entire social network at once, so all of the world can see your brilliance! Feeding into that narcissism is the killer feature of modern social networks.
You have a point there. "Social content". What is "social content"? Is it people talking about themselves and what they think and showing pictures of themselves? Is that what you do with your best friends? You arrive at their houses and start talking about yourself without being asked?
No, that's not what I (and presumably you and others here) do. We are nerds. We like talking about things that make us think.
But yes, for the majority of people, talking about themselves is exactly what they do. They talk about their vacation to the beach. They talk about the drama going on at work. They talk about their sister's date. They don't talk about advances in database design.
I was at a live event (a play) recently and was fascinated by a small group of women in their late 20s / early 30s. They spent a good 10-15 minutes before the play started just taking pictures of themselves being at the play and posting it to their social networks. They talked about the pictures, asked others to send them their copy of the picture. They took pictures from one angle and then another. They talked about who "liked" the picture they just uploaded. It went on and on and on. Not once did I overhear them talking about the play they were about to see. It seemed to be not the point at all. The play was just a hashtag for their social media posts.
hmm. I think people expect their friends to do some posting about their life or interests on (our current) social networks - so they implicitly have permission to do so. We don't expect our friends to immediately start talking about themselves in-person, but I do think we expect our friends to talk about themselves and what's going on in their lives - it would be pretty shallow friendships if we only talked about the weather or the news headlines.
Most good conversationalists are good at it because they explicitly draw the other person into talking about themselves and their interests. Whether things become narcissistic is more a factor of personality I think. Perhaps its more than that, though. A good conversationalist would steer the conversation to more interesting content - ie why he person is passionate about their hobby rather than just talking about their accomplishments. Perhaps we need to think about social network features that model what good conversationalists do? Not sure what that looks like though.
Blogging is about the same as posting in your own "feed" on any social network.
See, for example, the indiewebcamp[1] people, who are against "silos", as they call Facebook et al., but are recreating their same functionalities with personal blogs and a new version of Pingbacks called "webmentions".
But the pingback was more about making your blog post a comment on someone's post. So it was creating a conversation. I could read a post, respond to it on my blog and ping back to the original. That way my feed was a mix of the discussions I'm having with others as well as my own stuff. I suppose if you take away the "feed" element, you could call web forums and Usenet "social networking".
> But the pingback was more about making your blog post a comment on someone's post.
That's what "webmentions" do.
> That way my feed was a mix of the discussions I'm having with others as well as my own stuff.
Here's something I like: everything you say is part of a public discussion, so you're talking alone, also comments have about the same weight as standalone posts, also outsiders can join the discussion, it isn't restrict to your current circle of friends.
What you are describing has existed for years and we called it Usenet. Or a forum. Or a mailing list. I can accept that "social networking" is a bad term, but in popular usage it encompasses the "personal feed" almost by definition.
> I can accept that "social networking" is a bad term, but in popular usage it encompasses the "personal feed" almost by definition.
Yes, but just because no one has tried to create a different social network. That's why I made my initial comment in the first place.
> What you are describing has existed for years and we called it Usenet. Or a forum. Or a mailing list.
I don't know about Usenet, but forums and mailing lists are generally oriented to narrow topics, it is not something in which you'll see your school friends or people with multiple areas to discuss varied subjects.
It doesn't have to be. I have my blog using Pelican, no comments, no pingbacks, nothing. I write about stuff that interests me because I feel to. If someone finds one of my articles and they find it interesting, good for them. If they don't, good for them too.
On Twitter and Tumblr you can make extra accounts to participate in discussions you're interested in, and select people to follow based on that, so the feed system is okay for talking about things other than yourself if the feeds don't include everyone you know by default.
Tumblr has some pretty good discussion about movies and books.
Twitter not so good for discussion because off the length limit, but there's plenty of people posting concise observations and jokes rather than posting about themselves.
On both systems, people can reply to content from strangers, and there's lots of conflict arising from that.
I do think Tumblr would be improved by making it easier to have discussions that don't go to all your followers by default, for example like on Twitter where if you tag people at the start of your tweet, it doesn't go into the main feed for your followers who aren't tagged.
Or you can go all the way to partitioning a system into topics, as with Reddit. I wouldn't call that a social network though, you don't just casually start a conversation with people you've chosen to connect with, you start a conversation with a subreddit.
Since the author didnt mention it, the original creator of the patchwork project is https://github.com/pfrazee
When I used it, which admitedly was a long time ago now, the biggest setback was lack of cross device identities. So I ended up having two accounts with two feeds, `wesAtWork` and `wes`. Maybe they have solved this by now.
ps. Does patchwork still have the little gif maker? Because that was a super fun feature.
Also, because Paul has awesome projects, and deserves some attention when a project of his makes it to the top of HN but doesn't even mention him, he is working on a browser for the distributed web called Beaker (I am using it to write this now), and it is awesome.
@cowardlydragon you got downvoted to death but that's a fair assumption so I want to reply to you here
> forking a website so easily also makes spoofing very easy...
A fork copies the files of a site, so yeah, it certainly would be easily to spoof somebody's site. It basically is a spoof button. But doing so creates a new cryptographic identity for the site, and that will be the basis of how we authenticate
Cross device identity is still an issue, but not a problem in the foundation. It's a matter of making client apps (like Patchwork) recognize a message of type "link this and that account together" and then your friend's app would automatically follow both accounts and render them as if they are the same thing. It'll be done eventually in Patchwork.
Well, yes and no. The log will show a different id (public key) which authored the message. But the device itself (iPhone or Google Nexus or whatever) doesn't need to be mentioned.
That could leak information a user doesn't want to be leaked, like at which hours he is at work (using the work computer) etc.
Which id belongs to which device could probably be inferred when the service is used actively.
I understand that transparency might not be a design goal or techinically possible, I'm just raising the concern.
Can't I just share my private key across multiple devices?
Nothing stops you from copy-pasting your asymmetric keys (it's a file) to different devices. I bet it's feasible, the biggest issue is also making sure your log stays the same, because a log shouldn't get forked.
yes.
1) it would be significantly less secure - compromising either device would compromise both. Imagine an airplane with two engines that needs both to fly - a single engine plane is actually safer - because the chance of loosing one of one is less than the chance of loosing one of two, (assuming chance of engine failure is independent)
Use a separate key on each device is like a two engine plane that can still fly with one engine - this is significantly safer than a single engine plane.
2) it would greatly complicate the replication protocol, having to take into account forks, rather than assuming append only, where you can represent the current synced state with a single counter.
I'm having trouble following this and the reply thread below. Why is identity device-specific? So every time a get a new computer I have a new public key?
You can also use the same keypair on multiple devices. This however results in another problem: You could post content from both devices simultaneously. But the underlying protocol requires each message to refer to the previous message by the same identity. So if two different devices post a message without having received the message of the other one, one of the messages is considered invalid.
Can you (or someone) clarify the difference between Patchwork and SSB? Does SSB handle the networking and discovery and encryption and whatnot, and Patchwork just acts as front-end for displaying diaries, connecting to pubs, posting and so forth?
Patchwork is a user interface for displaying messages from the distributed database to the user, and to allow the user to add new messages. The underlying protocol supports arbitrary message types, patchwork exposes a UI for interacting with a subset of them. Anyone could write and use other UIs while still contributing to the same database. Patchbay[1] for example is a more developer-centric frontend.
Under the hood, patchwork connects to a scuttlebot[2] server. Scuttlebot in turn is based on secure-scuttlebutt (ssb).
The downvotes on replies are baffling over here. Here's what AljoschaMeyer said, and it's all accurate:
Patchwork is a user interface for displaying messages from the distributed database to the user, and to allow the user to add new messages. The underlying protocol supports arbitrary message types, patchwork exposes a UI for interacting with a subset of them. Anyone could write and use other UIs while still contributing to the same database. Patchbay[1] for example is a more developer-centric frontend.
Under the hood, patchwork connects to a scuttlebot[2] server. Scuttlebot in turn is based on secure-scuttlebutt (ssb).
[1] https://github.com/ssbc/patchbay [2] http://scuttlebot.io/
This excites me. I'm probably naive, but I always imagine that one day I'll retire and spend my days trying to work on an open source mesh network (or something similar).
I want future generations to live in a world where 'the internet' isn't a thing that authorities can grant/deny. A headless social network is a promising omen of a headless internet.
I've been thinking about this very thing the past few days!
Forgive the rambling, this is the first time I've written any of this down...
My idea is to use email as a transport for 'social attachments' that would be read using a custom mail client (it remains to be seen if it should be your regular email client or have it be just your 'social mail' client. But... if using another client as regular email, users would have to ignore or filter out social mails). It could also be done as a mimetype handler/viewer for social attachments.
Advantages of using email:
- Decentralized (can move providers)
- email address as rendezvous point (simple for users to grasp)
- Works behind firewalls
- Can work with local (ie Maildir) or remote (imap) mailstores. If using imap, helps to address the multiple devices issue. Could also use replication to handle it too (Syncthing, dropbox, etc)
Scuttlebutt looks like a nice alternative though. Will be following closely.
I had been thinking about something like that too some years ago. Subject or first line of the mail should act as headers for the mail client extension parser. You could tag the social object you send out (event, picture, status update) and users could subscribe to those (the client would just filter out)(it solves the problem of being interested in an author's upcoming books and social comments but not in his comments on his family vacation). Likewise you could choose who get your updates.
Problem is you don't have a mean to publicly advertise your status and offer a way to subscribe. That would be a third party provider. I can imagine someone fetching everyone's updates and providing a mechanism to just resend the mail via a public web repository that would act as a public registration hub.
That would be a huge data mine though. Unless you add pgp in the mix and then you have to hit the mark on the client pgp handling to easily allow close friends to give out their public key.
Wouldn't that make a fun POC project ?
I remember I was thinking about it when pownce came out.
I still believe the net would be so much more fun with the likes of pownce and w.a.s.t.e around :(.
I remember having some actual conversations on w.a.s.t.e. That's never happening with torrents.
That's absurd. True, it has become more complicated than it once was, but that's every technology that isn't dead.
Granted, I have been running mail for a long time, so I got to learn the complications as they happened, rather than all at once. But anyone who can set up a production-quality web server/appserver/DB along with the accessories that go along with it can handle it.
Now if email isn't important to your business and/or you just don't want to deal with maintaining it, that's valid. But it just isn't as difficult as a lot of people seem to want to make it out to be.
" I have been running mail for a long time, so I got to learn the complications as they happened, rather than all at once. But anyone who can set up a production-quality web server/appserver/DB along with the accessories that go along with it can handle it."
Not in a long time. I haven't found a situation in which I couldn't use Postfix in quite a while. Although the occasional sendmail.cf flashback still hits me.
I like this idea and think it has legs. But, eventually it might be worth rewriting daemons for SMTP/IMAP that have a simple config file format that would allow simple white/blacklisting domains such that one could run a server that would reject all emails from any domain other than the 6 one specifies. Further, without pgp encryption/signing msgs are discarded.
I think such an approach could be interesting, but it seems there is a need for a non-profit to govern such a thing.
I think of hipster as someone who follows (non mainstream) trends, goes to starbucks, loves Apple products and cannot live without Wifi. Not a hacker who builds it's own stuff and cares about privacy. But maybe the beard confuses people.
It's hard to define the term, but I think at the most basic level, "hipster" specifically refers to anyone who enjoys being alternative. "Goes to starbucks, loves Apple products and cannot live without Wifi" are common qualities of people who want to be hipsters but only because they think it's cool (and don't actually embrace an alternative lifestyle).
It's pretty funny because of the stereotype. On the other hand, it makes sense that someone dedicated to improving the state of privacy and personal data ownership lives on a boat. If he decides the government becomes too authoritarian, he just leaves with all his stuff. No go-bag needed because he has a go-house. Lay low in SEA near a wifi hotspot or something. You can't escape the reach of corporations, but he (and others) are working on it in the form of scuttlebutt.
Substack is awesome. I think he's very much a modern day instance of the old hacker ethos. He's so creative, too- I've really enjoyed some of his project ideas and presentations I've been fortunate to see.
I am not much of a social networking type of person, but I have wondered how nice it would be to network with a community like HN. For example, I see a nice comment chain going on in some news article, but as the article dies so does all the conversation within it.
Maybe it's just me but if I see an article is x+ hours old (15+ for example), I don't bother commenting.
What type of social networking would HN use for non personal(not for family and immediate friends) communication? (I've tried hnchat.com, it's mostly inactive imho)
I like IRC still. For example, the science based channels in freenode.net is pretty good. #biology, ##physics and ##math. Might not be the worst of ideas for a mod to make one for us :s (provided they already have an IRC client going, it wouldn't be too much added hassle).
So it seems there are two ways to exchange information:
1) be on the same wifi (presumably great for dissidents in countries with heavy-handed internet control, and inconvenient for everyone else)
2) use "pubs", which can be run on any server, and connected to ¿through the internet?
So most users would use pubs, which are described as "totally dispensable" (a nice property). But how can users exchange information about which pub to subscribe to? Is there a public listing of them?
It seems like the "bootstrapping server" problem (eg; reliance on router.bittorrent.com:6881) will still exist in practice. For that matter, is there currently an equivalent to router.bittorrent.com that would serve this purpose?
This seems like a potentially significant project, and I'm excited by the possibility that it might actually take off – hence the inquiry.
It depends if you want to use it like Twitter (public announcements) or like Facebook (closed small/medium circles). If you use like Facebook, then it's enough that one person among your circle of friends (probably the most tech-savvy one) would host a pub and use that for their friends. You can see how you would probably be connected to a few pubs, because you usually have different circles of friends. If you want to use it like Twitter, then indeed we might need a DHT, but the point there was the resilience of the network.
Okay... in the FB case, though, my friend has to send me the IP/DNS address of their pub somehow, right? eg; Signal?
What about organizing groups, which might currently use Slack? For example, political dissidents who don't necessarily all know each other personally. They must use some other communication channel to communicate pubs?
Item 1 also applies to people trying to communicate in post-disaster scenarios (e.g. earthquakes).
I know some people who work in that area, and every time one of them finds out I work in software their first question is about mesh networking. If SSB is what it seems to be (user friendly, no-frills ad hoc mesh networking) then that would be huge for emergency and disaster planners! Is it mature enough to be used in this way?
The equivalent to a bootstrap node is the pub to which you used an invite code to connect to when joining the network (unless you connect to the network in a different way, like by being followed by someone else on your LAN). When you use an invite code, you publish a message with the pub's address on your feed. When peers replicate your feed, they see those messages, and thereby find out about new pubs to connect to.
Can I choose who's content I pass along? I am ok distributing my own feed, that's presumably why I am joining the network. I am not OK passing along someone else's hate speech, porn, warez, malware, spam, etc. I'd like to be able to review the feeds available and say "Yeah sure I'll pass that around." If everything in a feed is encrypted then I'd need to decide. Also yeah my brother who's feed I follow and pass may upload a really nasty bit of content and I may relay it.
Your computer will only help host data that was hosted by people that you follow. If you don't want to spread content that you disagree with, don't follow people who post such content.
Your freedom to spread filth is precisely equal to my freedom not to repeat it.
Also, please remember that the American First Amendment limits Government speech restrictions. Private communities and individuals can make any rules they want about social acceptable speech.
yes. users actively replicate data, not passively. they have inherent control over what they pass on. If a community doesn't want to replicate your data, then go to find one that does.
What blogging system? Who provides the infrastructure? Getting back to pull/subscriptions via RSS would make me happy to, but this doesn't solve the problem of who's platform are we all sharecroppers on.
You can write a JavaScript component to read out the feed in the background and transform it into an RSS feed. Current U.S. law prevents someone from offering this commercially, especially in a SaaS package.
The storage requirements are tremendous, though, right?
If I want to have access to everything that's been shared with me, I have to store it all. In the case of images, the storage burden can get large quickly.
There are basically two types of storage. Logs and blobs. Logs were described in the blog post, but blobs weren't. Blobs are mostly images that type of stuff, and are stored in leveldb. It can easily get to 1Gb or more. The trick is that blobs aren't sigchained, so they could be garbage collected, and that is something that we're working on. Logs can't be garbage collected, but they grow slower than blobs do, and are usually around 100 Mb or less.
Well.. I've been on there for quite some time, granted it's been not mega active but here is a rundown of how much it took until now: there is the main sigchain database, which stores all the messages (following, posts, ....) which is now 150megs in size and there is the blobs (binary attachments like images) which is about 500megs in size. YMMV depending on how many catpictures your friend share ofc.
The flipside to your remark is, that it is fully offline capable and I'm perfectly happy with that. Also: contrast it with how much space a thunderbird profile takes up.
How would that change if you had, say, 5,000 friends – the fb limit, which some people do reach – who were posting multimedia content multiple times a day (which happens on fb)?
Is the protocol set up in such a way as to enable easy, automatic deletion of old data from local devices, while still storing them for easy search/scroll-based access on the Pub servers?
That's going to be a bigger problem on mobile than it is on desktop (I mean, not literally those amounts, but the amounts you'd get from a busier feed).
The entire stackexchange and English Wikipedia dumps including all media is less than 90 Gig. Even low end cell phones have men expansion slots to 128 gig. Whatever you plan to do socially maintaining a local copy is not a storage issue. Non of the cloud ppl will tell you that though.
But I bet Facebook is much larger. If this were to really meet my needs (i.e. people I actually know start using it regularly), I can see this becoming an issue that needs to be solved, especially on mobile. Bitcoin ran into this issue as well, the need for a client to get the whole blockchain. I can see some solutions once cross device identity is done where I get a small amount of the network, perhaps the most recent, then it syncs up with the larger storage on my home PC later.
> Thank goodness Facebook isn't what I want my social feed to look like... all those GIFs and garbage updates.
Sure, but I have few enough friends as is. I know literally no one who would use this, as neat as it is. Bootstraping a social network is hard for both developers and users, but once it gets going, storage requirements would rise fast.
Does Scuttlebutt intend to store every post forever? Or could posts 'expire' and get deleted, like on Usenet? It would be on you to save the content you wanted to have long term. Somebody could always take on the burden of capturing an archive of the whole thing in perpetuity and provide web access to it, like the archive.org does for usenet.
Check other comments here in HN thread about "blobs" and garbage collection. But also, there is the easy possibility of just starting a new account. In fact, substack did this, we refer to his old account as "deadsubstack".
Note that after the turn of the 21st century, people were not expiring non-binaries posts on Usenet.
I observed in 2011 that HighWinds Media had not expired any non-binaries postings since 2006, and that Power Usenet had not expired a non-binaries posting for eight years ("3013+ days text retention" was in its advertising at the time). People effectively just turned non-binaries expiry in Usenet off, in the first few years of the 21st century. I did on my Usenet node, too.
I observed then that the Usenet nodes' abilities to store posts had far outstripped the size of the non-binaries portion of a full Usenet feed, which was only a tiny proportion of the full 10TiB/day feed of the time.
The distinction of binary and non-binary posts on Usenet is paralleled by the separation of messages and blobs on Scuttlebutt. As staltz [explained](https://news.ycombinator.com/item?id=14051181), we can garbage collect ("expire") blobs, but not message logs (although a client could do so with the current APIs, it would have security/trust and UI implications, and I'm not aware of any clients doing so).
We are also basically betting on the size of our message logs to generally grow slower than our individual storage capacities, and it is interesting to know that that worked for Usenet too. For blobs, we will likely develop some garbage collection or expiring approaches. Since the network is radically decentralized, each participant can choose their own retention policy. You can, in fact, delete all your blobs (`rm -rf ~/.ssb/blobs`) and assuming some peers have replicated them, your client will just fetch them again as you need them.
You're comparing apples and oranges. Usenet was federated across beefy corporate servers. No single user had to walk around with the entire Usenet archive on their laptop.
I wasn't comparing anything. I was proposing a suggestion and only mentioned usenet as a means of explanation.
> I'm simply pointing out the error in the premise of your question.
No, you made a non-sequitur factual post about Usenet. I see no actual error pointed out. The fact that Usenet stopped expiring non-binary posts after most of their traffic fled to other services is not a valid argument against possibly using the feature in a peer to peer distributed social network.
If you don't see an error in your premise being pointed out, then you need to put your "posts expire and get deleted, like on Usenet" right up against "people were not expiring non-binaries posts on Usenet" until the penny drops.
Then you need to notice the point, already made by others as well, that the premise of ISL's question is erroneous, too. The storage requirements are not necessarily "tremendous", if one actually learns from the past. Again, your comparison to Usenet needs to involve considering how Usenet treated binaries and non-binaries very differently. (One can look to experience of the WWW for this, too, and consider the relative weights in HTTP traffic of the "images" that ISL talks about and the non-binary contents of the WWW. But your comparison to Usenet does teach the same thing.)
Your and ISL's whole notion, that everything is going to get tremendously big and so everything will need to be expired, rather flies in the face of what we can see from history actually happened in systems like this, such as the one that you made your comparison to. Usenet did not expire and delete non-binaries posts.
By making this comparison and then trying to pretend that it's someone else's non-sequitur you are closing your eyes to the useful lessons to actually learn from your comparison. Usenet, and the Wayback Machine, and the early WWW spiders, and Stack Exchange, and Wikipedia with all of its talk pages, and Fidonet in its later years (when hard disc sizes became large enough), all teach that in fact one can get away with keeping all of the "non-binary" stuff indefinitely, or at least for time scales on the order of decades, because that is not where the majority of the storage and transmission costs is.
People have already danced this dance, several times, and making a distinction between the binary and the non-binary stuff and not fretting overmuch about the latter when one looks at the figures is generally where it ends up.
I was thinking the same thing, and I don't know enough (or anything really) about this to comment to this, but my second thought was that this probably works like bittorrent, where you don't need all of a file to make sense of the individual pieces.
Let's say for instance that the file you're downloading is a long text file containing a novel, but all you care about is chapter 3. Then all you need are the pieces for chapter 3 – the rest can stick around in the ether somewhere.
This is harder to do with bags of bytes obviously – how do you know which bytes belong to chapter 3? – but if the pieces are self contained messages where you don't need either the previous or the next to make sense of it, then it should be trivial to link to them and the distribution could work like this. Whether it actually works like this or not I have no idea. Sounds like an interesting project anyway!
Yeah, I've been working on something similar to this off and on. Functionally, the mechanics need to be similar to a distributed filesystem like Tahoe-LAFS or Freenet. Content has to be entrusted to the swarm.
They should break it up by time. For example your database only syncs the past month or so as needed and you can choose to request more if necessary. Someone mentioned 150MB right now. That thing is going to get massive eventually.
The quoted 150 MB is for all messages in one's network. On my local node there are 91773 messages, going back roughly a year and a half, taking up 147 MB - of which about 72 MB is the actual messages, and the rest are indexes. gzipped, the 72 MB of messages goes down to 29 MB.
I think I missed something. If information is exchanged when machines are on the same network, how does the guy in New Zealand get updates from the guy in Hawaii? Is there a server involved, or does the New Zealand guy have to wait until he is on a network with someone who has already connected with the Hawaii guy?
Information can also be exchanged using legacy internet.. ;)
kidding aside: we just introduce a few instances that act as exchange points (also called "pubs").
I played a little with the idea of using tor hidden services to directly connect to people, so that you don't need another computer that runs all the time.
(Public) information is not only shared between friends, but also between friends of friends. So as long as they have common friends connected to the internet, the data flows without problems.
To help with this situation, the network includes so called pubs. These are basically bots that run 24/7 and friend people. The article very briefly mentions them. More information here: https://www.scuttlebutt.nz/concepts/pub.html
I'm not totally sure how the traffic management works, but what I would like to know is how services like this will be able to scale? What happens when there is a Pub with millions of users? Does it creep to a halt? Is there a need for dedicated Pub machines? If so, Who funds/maintains them? Does this lead to subscriptions?
Decentralized social networks seems like an inevitable progression as internet users become more aware of their privacy and ways they can improve online relationships and ...."social networking"
It scales by each "circle of friends" or community having its dedicated pub, set up by some tech-savvy person. Pubs should be easy enough to setup for anyone who knows what a VPS is. A community or circle of friends is usually not millions of people.
For the "social media aspect", like in Twitter, we're looking at making alternative types of pubs. Imagine having a pub dedicated to only replicating your content (and no one else's). Or multiple of these. So that whoever wants to follow (if you're like Elon Musk famous) can just follow one of your pubs.
I love the idea of a decentralized community.
I would just be skeptical that even 1% of the population knows what a VPS is, and fewer would have the urge to set one up to talk to their friends.
Not suggesting that you need to have a global audience, but just something I was curious about. Thanks for your reply.
I'm not sure how this project in particular handle it, but i see quite a few project that handle it by simply using the "follower" for a user as their replica.
Since the message is signed there is no risk of being alterated or forget, so i can get the message of user FOO from user BAR as long BAR is "following" FOO.
This scale well since the more popular you get more peer can replicate your messages and post.
It is certainly meant to scale, it's an eventually consistent decentralized database. Centralized services historically have difficulty in scaling, but distributed and decentralized are by design easier. In Scuttlebutt, we don't need to have everything connected to everything, so it easily scalable. You just need to give up the idea of a global singleton, global search indexes, that type of stuff.
Right now Mastadon might as well be off-grid, unable to add additional accounts on the main server. Popularity has stunted it's growth!
I am not sure how much thought has been given to the scalability of this solution, it sounds like it will benefit from most of the advantages offered by P2P in this department.
Eventually something like this could organically grow into the "next Internet", in much the same way that the current internet has morphed into what it is today.
My point was not what I could do but rather what I will do to try out some random new social network. Having now read that migrating identities is currently impractical I am even more certainly not going to take a chance on some other random server or even my own!
How well has federation worked out in practice (for other federated, social network related protocols) so far?
As far as I know, federation has only worked for ancient stuff that has nothing to do with social networks, like email and DNS. Basically, it is a part of core functionality and thus can't be co-opted by commercial interests (though GMail has made quite an inroad!).
Until it has proven itself, social federation doesn't really seem like a strength to me. It does sound good in theory! Other people with actual experience are adding their anecdotes which lines up with what I'm trying to say.
> As far as I know, federation has only worked for ancient stuff that has nothing to do with social networks, like email and DNS. Basically, it is a part of core functionality and thus can't be co-opted by commercial interests (though GMail has made quite an inroad!).
Email only works _because_ of big players like GMail. Running your own server spam free and away from black lists is an endless task.
DNS is going a similar way with more and more ISPs resorting to hijacking DNS lookups for all sorts of nefarious reasons. This protocol seriously need a broadly embraced signature system to validate origin.
The post starts by introducing two people (one in a boat in the ocean and another in the mountains in Hawaii) and states that they are communicating to each other. I thought this post was about some new long-range wireless protocol that sync'd via satellites or some such. I was disappointed to see this:
> Every time two Scuttlebutt friends connect to the same WiFi, their computers will synchronize the latest messages in their diaries.
Ultimately this technology seems to be a decentralized, signed messaging system. What problem are they solving? That facebook and twitter can delete and alter your messages?
Meanwhile I'm in search of a long-range, wireless communication system that can function like a network without the need of an ISP. Anyone know anything about this?
> Meanwhile I'm in search of a long-range, wireless communication system that can function like a network without the need of an ISP. Anyone know anything about this?
>Meanwhile I'm in search of a long-range, wireless communication system that can function like a network without the need of an ISP. Anyone know anything about this?
Mesh networks/WANETs. But you need enough adoption for the network to be considered long-range. Generally they are local-only.
> For instance, unique usernames are impossible without a centralized username registry.
This is Zooko's triangle and was squared by blockchains. Namecoin (2011), BNS (the Blockstack Name System, 2014), and now a bunch of other fully-decentralized naming systems can give you unique usernames. Recently, Ethereum tried launching ENS and ran into some security issues and will likely re-launch soon.
Problem is, I don't want to be assigned a username. I hate it when I get assigned a username. I want my username. If you hand me a username of "$&OdUgr606cZ", I will never remember that, I will never share that, and I will consequently never ever log in.
But it doesn't matter because this issue is already solved. We already have globally unique usernames. They're called email addresses, they are unique by their very nature, and they are (for all intents and purposes) already decentralized.
> But it doesn't matter because this issue is already solved. We already have globally unique usernames. They're called email addresses, they are unique by their very nature, and they are (for all intents and purposes) already decentralized.
No, they're not: billg@microsoft.com depends on microsoft.com, which depends on com, which depends on the root nameservers, which are … a central nameservice.
That's the whole point of Zooko's Triangle: of secure, decentralised and human-readable, you can have at most two. Global-singleton approaches are still centralised (the singleton is the centre), although they may build the singleton in a decentralised fashion.
I think you misunderstand what the phrase "for all intents and purposes" means. It doesn't mean "literally, 100% true" it means "for true enough for this argument". What network does your blockchain run on? It still relies on Comcast to get to my house right? Because you want it to run over the Internet? Maybe you're using AT&T? Probably L3 is in there somewhere, but you're still relying on a centralized piece of equipment somewhere, and you're probably going to have a .com or .org to advertise it, and you might have a Wikipedia page or a Facebook group or collaborate development on Github and chat with your team on Slack and exchange files on Dropbox and send messages on Gmail and you log into all of those services with... your globally unique email address. Possibly using a domain you own, with the mail exchange hosted on a server you own that you set up specifically for this project.
Maybe I'm missing the point, and I would look to you to explain to me what that is. But I guess congrats, you don't rely on ICANN anymore...
> I think you misunderstand what the phrase "for all intents and purposes" means. It doesn't mean "literally, 100% true" it means "for true enough for this argument".
Email addresses aren't in any way decentralised. Saying they are isn't true enough.
> What network does your blockchain run on?
The product in question _doesn't_ rely on ICANN, or Comcast running to your house; it can work without either of those.
xyr point seems to be that your claim that e-mail addresses are decentralized is faulty. No amount of "Well you are not decentralized in your block chain, either." is going to rebut that. Indeed, it actually reinforces the argument that your claim was faulty, by implicitly agreeing to it with a "but neither are you" response.
So perhaps you would like to now explain how e-mail addresses are a system without a centre. Bear in mind that you yourself have just made the point about ICANN being at their centre. (-:
That's only possible with what SSB Handbook calls a "global singleton". That's what I meant with "centralized username registry", which SSB does not have.
Does it normally take too long indexing database? since I started the app have been a long while.
I thought this could be a nice tool to use in places like Cuba, but I've realized now, that once connected to a Pub it download more than 1 GB, that would be a problem too in a place with lack of internet bandwidth.
In places lacking internet bandwidth, people could run pubs in hackerspaces, schools, offices, homes, Actual Pubs, etc. A pub in a place that people frequent would gossip messages for the people, so they would not all need to connect to the internet all the time. Even the pub itself doesn't have to connect to the internet for it to be useful, as it would still help messages spread when people connect to it. As long as someone in the network connects to the Internet at least once in a while, people will be able to communicate with the broader network. With this architecture we can make more efficient cooperative use of network bandwidth.
Basic question: since the entries form a chain and reference the previous, is there no way to edit or delete your old entries? (I see it "prevents tampering" and there's something of a philosophical question here about whether you're "tampering" with your own history when you editorialize -- I agree with the crypto interpretation, but in the context of offline interaction, social communication isn't burdened with such expectations of accuracy or time-invariance.)
If so I see that as a fairly large limitation for the common user. Even though truly removing something from the internet is effectively an impossibility, I think most non-technical folks aren't actively aware of this, and I'd at least like the option make it harder for folks to uncover.
There is no way (as far as I know) to delete old entries, and I think this is good because with gossip mechanics we cannot lie to ourselves: there is no way of stopping that information from spreading.
What's possible, on the other hand, is to make a message type "ignore the previous" which client apps would interpret to hide them, but obviously a client app can be configured to not hide them.
Yes I suppose I'm targetting the protocol a little too much when I should think about what features clients can implement.
I'm certainly happy with the "gossip" approach; I just see it as challenging for some people to adopt when they are coddled with the idea that they can censor their past.
Another very important point with no-deletes is: you can't deny history. You can't, in real-life, take back something you said. The mechanics of digital deletion in centralized systems is a big problem for History. I remember Julian Assange describing a corruption scandal that was essentially erased from History because a digital article by a large media company was deleted, and not replicated fast enough.
Usenet is, once again, a lesson here. Remember cancel messages. Then remember forged cancels. Then remember the people who decided to stop respecting cancel messages. Then remember the discussions about signed cancel messages. And so on. (-:
I wonder if there was one added step on every post to this network "warning, you are about to make this public without the possibility of removal, are you sure you want to post this?", would that better educate people and make them less likely to make public that which they will later wish was private? I think the solution is also a culture or user education issue. Basically teach people more about how the Internet works and they're less likely to want to post information on it.
what happens "in real life" when two people go by the same name?
Scuttlebutt works the same way: anyone can name themselves anything and anyone can name other people anything, it's up to the client how to interpret those messages. more on how SSB embraces subjectivity: https://youtu.be/P5K18XssVBg.
This is so awesome that it deserves a separate blog post. I tried to contain some excitement and not expand on it with this blog post, but it's really revolutionary, and super easy to adopt.
In that case it would work. I was under the impression that you'd be syncing with everyone so someone in New Zealand could contact someone in Canada.
The average person has 208 twitter followers. So lets say you have 208 'friends' + a couple additional 'friends' for each of your original friends. That's 624 people total.
There are 100 million active twitter users each day and 500 million tweets per day, that's 5 tweets per person.
5 * 624 = 3120
That's 3120 posts you'll be processing per day. Multiply this by 140 bytes per post and you have 436800 bytes per day or 159.5 MB per year.
Except for pubs, which are helpful broadcasters of both private and public stuff, but it makes sense to host them on an infrastructure that can ingest 70 GB a day and have a couple days of retention
Which means even less requirements for such "private" pubs. However if SSB is ever to replace Twitter, I would guess there would be other, "public" pubs that try to get all the content possible.
Not that this is a bad thing. There's still life in USENet, and a fair few people still sit and discuss things in various groups (if you know where to look)[1]. The backbone concept of USENet is still great from a decentralised point of view - someone just needs to add some crypto layers to it (as a standard), and I reckon it could rise again like a phoenix.
---
[1] I'm deliberately and totally ignoring the large elephant in the room with HDDs full of pirate software, media, and porn.
If you make a decentralized, encrypted, potentially anonymous network some users will use it for bootlegs and probably worse. That doesn't preclude or outweigh all the legitimate uses.
History says otherwise. Usenet shows, in the ever growing ratio of binaries to non-binaries traffic from (roughly) the 1990s onwards, that in terms of the literal weight (i.e. the traffic volume) the uses that you talk about very much did outweigh the "legitimate" uses, by an order of magnitude at least.
And the history of Usenet also shows that people do think that the fact that something is mis-used should preclude any use of it. One can look to the history of how several organizations discontinued their Usenet services as a lesson for that, too.
Traffic volume for binaries is higher than that of text because the message size is larger. They are also often chunked into sub-messages that Usenet servers counted as individual messages but semantically are not. Many binaries are legitimate redistribution, although of course not all.
Do you mean to say that there are more binary messages than text messages by an order of magnitude before chunking and that none of those binaries were permitted copies under the law? Because I'm going to need to see evidence to be convinced of that.
Interesting question: "How avoidable is this outcome with or without Scuttlebutt?" In Cuba, USB sticks perform this function, distributing all kinds of media - but mostly movies - throughout society in the face of heavy censorship of every other connection. Admittedly, though, that still requires a kind of common consent. The "and probably worse" stuff couldn't be passed around this way; certainly not so easily as the latest Hollywood movies.
Not sure why this was donvoted. From a bird's eye view, the whole system looks remarkably similar to Usenet, especially in the old times of UUCP, back when systems were mostly offline and had relatively short timespans to exchange information (via dial-up connection or similar).
What's different now is that we have pleny of disk space, and more than enough computing power to perform proper cryptography.
Probably the same reason that "slack == irc" and anti-systemd comments get heavily downvoted. Younger generations want badly to believe that they've invented something completely novel and unique.
The reason those comments get downvoted is because they are wrong. Anyone writing off slack as an IRC clone exhibits a very poor understanding of either system. Slack is similar to IRC in that they deliver text-based communication, but that's about it. The underlying protocols, features, and so forth are radically different.
Most anti-systemd comments are similarly poorly thought out and articulated.
This might be superficially usenet+crypto, but oftentimes
things are more than the sum of their conceptual parents.
Indeed, that's what I find exciting about the idea. I'm not keen on the implementation, which is built on npm & JavaScript, nor the protocol, which is built on JSON, but those might be worth it if we can restore some of the glory which was mid-90s Usenet, with security, privacy and accountability.
Especially since the implementation details are not set in stone. Once the lower-level protocols start to stabilize, more efficient implementations will probably emerge.
This is great! Exactly the type of service I would use as someone that avoids centralized social networks.
Also wondering, can this be a replacement for Slack? Can I set up a private group chat room? Or can I only use the private @ feature to send private messages to multiple people?
Yeah you can and it works quite well, but is limited to 5 or 7 people or something like that. But 0 metadata about who's involved in the conversation is great, feels way more secret shhhhh
"Scuttlebutt was created by Dominic Tarr, a Node.js developer with more than 600 modules published on npm, who lives on a self-steering sailboat in New Zealand."
This sounds more like a mesh communication tool, or a git repo, than a social network to me. "Social network" sites like Facebook aren't so utilitarian. Facebook was the place where people post life's highlights. (I say was, I haven't been there in years. I don't know what it is like today.) HN is also like this, except it is a stream of project highlights.
I've been working on something that has some similarities. Its a mesh network of pest traps in the New Zealand bush. Battery life is very important, so each node sleeps most of the time, then periodically wakes up and communicates with its neighbours. Made more complex by devices not having real time clocks.
Once every node in the network is powered down most of the time, I don't think you can consider it a grid.
What I was claiming in my answer was that it was done in a negative sense. I understood it that way because the comment lacked other indications of genuine interest in the matter, which would soften the lack of inflection.
Great point. I didn't consider that the phrase could be interpreted in an criticising sense, but of course your right that verbally it can be done that way. Writing is hard. :P
I guess rude is in the eye of the beholder? :) Your response seemed rude to me (challenging the use of a pretty common phrase to elicit elaboration), which is why I suggested you were upset. Subtleties and true intention are often lost in written communication... :)
Whether or not you have the "right" to be rude, I think the more cordial we are the more we can align around learning through discussion.
If someone is rude to someone who's mistaken that could help them unlearn something, but more likely they will become more entrenched because of consistency principle.
Cialdini talks a lot about consistency principle in his book the "Influence: The Psychology of Persuasion"
I think being rude to someone who's rude to you is childlike and petty. It's how wars start. It's much more productive to either ignore it or try to figure out what's really going on.
It's not always natural and of course we all fail at it from time to time, but justifying poor behaviour because someone else started it just isn't cool.
UUCP [https://en.wikipedia.org/wiki/UUCP] used the computers' modems to dial out to other computers, establishing temporary, point-to-point links between them. Each system in a UUCP network has a list of neighbor systems, with phone numbers, login names and passwords, etc.
FidoNet [https://en.wikipedia.org/wiki/FidoNet] was a very popular alternative to internet in Russia as late as 1990s. It used temporary modem connections to exchange private (email) and public (forum) messages between the BBSes in the network.
In Russia, there was a somewhat eccentric, very outspoken enthusiast of upgrading FidoNet to use web protocols and capabilities. Apparently, he's still active in developing "Fido 2.0": https://github.com/Mithgol