I'm happy to see this article, and it reminds me of things that others have been talking about for some time (for example, the "Redecentralize" community).
I've participated in some file-sharing litigation which has made it very clear to me that decentralized P2P systems are not inherently more anonymous than other technologies. In fact, there's a cottage industry of P2P monitoring companies that participate as peers in the P2P networks and record detailed information about the IP addresses of peers that uploaded and downloaded particular files. There are often paradoxes where decentralization helps privacy and anonymity in some ways but harms it in others -- for example, if you run your own mail server instead of using Gmail, then you've prevented Google from knowing who communicates with whom, but allowed a network adversary to learn that information directly, where the network adversary might not know the messaging relationships if everyone on the network used Gmail.
I guess a related point is that information about who is doing what online exists somewhere by default, unless careful privacy engineering reduces the amount of information that's out there. Making the simplest kinds of architectural changes could just shift the location where the information exists, for example from Google or Yahoo or Amazon to dozens of random strangers, some of whom might be working for an adversary.
The only mechanism I'm aware of that truly allows anonymity over your own connection (or a connection that can be tied to you) is onion routing. On top of that, you must do it from a separate device or isolated VM to prevent hardware fingerprinting.
Anything less than that is like using snake oil crypto: it might make you feel good, but it's not really there.
For email there are various mixmail systems I strongly suspect you're far more familiar with than me.
A recent talk (I don't recall the conference) on de-anonymising anonlymous online communications shows sharp limits to even this, though there is some workfactor required. Better than nothing.
> Anything less than that is like using snake oil crypto: it might make you feel good, but it's not really there.
While technically true, it doesn't help the situation.
Against the NSA, yeah, you have to be perfect. However, most adversaries are not the NSA.
Encryption on the wire stops random eavesdropping on you while someone else is a target. Having your mail store on a colocated box instead of Gmail/Hotmail/Yahoo means that someone has to get a warrant and physically access your machine rather than filling in an automated request and having it turned over.
It's a modification on the old joke: "Sure, if the tiger is after me, I have to outrun the tiger. But if the tiger is simply hungry, I just have to outrun you."
>> The only mechanism I'm aware of that truly allows anonymity...
We have a need for both solid anonymity and zero anonymity. I think the first step is to be able to authenticate whom you are communicating with, and to reach them without a central authority. After that, you can choose to strip identifying information, or build a web of trust, or anything else. I think privacy can be built on top of an authenticated net, but the reverse is probably not possible. Today we have neither.
Onion routing is an anonymity mechanism for low-latency communications; there could be other mechanisms that are as good or better for some settings of high-latency communications.
Not that you are wrong, but essentially mixmaster routing of email is essentially oninon routing at the mail protocol level (as opposed to at the IP level).
I think it makes perfect sense to call it "onion routing of email" or something along those lines -- we generally do talk about "routing emails" (as in from email program to local smtp server, from local smtp server via an ISP smtp server, then lookup via DNS for MX record, on to the gateway smtp server, and so on to the final destination(s)).
[ed: Not to mention one thing probably stays the same: who runs the best, free onion routers/gateways and mixmaster servers? Intelligence agencies...
Back in 2008 I was studying P2P networks. I've made a BitTorrent crawler by duct taping v8 and libevent (there was no node.js at the time). It took about 5 minutes to scan a fresh Dexter swarm of about 100K peers. Then, I had all the IP addresses and plenty of metadata (download progress, software used, etc)
Kahle's approach works only for static content. It's not hard to distribute static content; BitTorrent does it just fine. The Internet Archive stores static content. Kahle thinks in terms of static content, because that's what the Internet Archive does. But it's less of the Web today. Despite that, it's good to have a way to distribute static content. Academic publishing, after all, is almost all static content. That should be widely distributed. It's not like academic journals pay their authors.
There's the problem that distributing content means someone else pays for storing and serving it. This is part of what killed USENET, once the binary groups (mostly pirated stuff and porn) became huge. There's a scaling problem with replication.
Federated networks are interesting, and there are several federated social networks. A few even have a number of servers in two digits. You could have a federated Facebook replacement that costs each user under a dollar a month at current hosting prices. No ads. The concept is not getting any traction.
Kahle wants a system with "easy mechanisms for readers to pay writers." That's either micropayments or an app store, both of which are worse than the current Web.
There are extensions to distributed protocols like bittorrent that are already deployed to address mutable, non-static content. The approaches I know of address content under the hash of the public key. One of these approaches is http://bittorrent.org/beps/bep_0044.html and ipfs supports this technique too.
If you have a single mutable pointer, you can build a feed of data that points at immutable content by its hash, which could replace the data model of twitter, facebook, or many other social networking web services. The benefits to decentralized distribution are huge: native offline functionality, trivially transferable identity, longevity and robustness against providers shutting down, direct commerce without middlemen.
Payments, or perhaps ISP-style peering arrangements may help with the spam/large binary problem. A big part of distributing the data model will also involve distributing the costs, but this is somewhere non-profits like the Internet Archive can play a very important role.
Why are micropayments worse than the current web? People have differing opinions on the advertising-pays-for-content-so-don't-block-it issue, but are you referring to something technical?
Multiple reasons, though many boil down to Gresham's Law or similar: it's difficult to assess the quality of information, particularly when disaggregated. Most media advances have occurred through bundling rather than unbundling options: magazines, books (collected parchments, monthly serials), subscriptions. Even advertising-supported broadcast and Web models work by aggregating product, though in this case, eyeballs sold to advertisers.
I see a mix of some advertising, patronage, and a content syndication system similar to the existing performance payment model for music (broadcast, commercial establishment use) via ASCAP and the Harry Fox agency as most likely:
https://www.reddit.com/r/dredmorbius/comments/1uotb3/a_modes...
With apologies for not reading all the references you list before asking:
> it's difficult to assess the quality of information, particularly when disaggregated.
Do you mean that it's difficult in general, or in terms of "should I buy this"? I could see micropayments work similar to Kindle - a 24 hour no-questions, semi-automatic refund policy. Don't think that article was worth 50 cent? Just "unpay" for it.
It's difficult for a number of reasons, which I spell out in the essay you've failed to read:
Market mechanisms work best where goods are uniform (either individually
or on aggregate average), their qualities are readily determined (or
again tend to average out well), where the fixed costs of production are
low and marginal costs of production high (relative to one another), and
externalities, both positive and negative, are small relative to market
price.
Information goods violate virtually all these assumptions.
● Quality is highly variable.
● Quality assessment is difficult, and often frustrated by other factors (e.g., pay-to-publish journals, "friendly" colleague peer reviews, discussed recently by Joerg Fliege at The Other Place).
● Quality isn't, and often cannot, be known in advance.
● Variance of individual instances is high enough that averages rarely suffice.
● Fixed costs of production are high, particularly for research, also to an extent for selection, review, and editing.
● Variable costs of production (e.g., publication) are low. In fact we're utilizing a system which was specifically created to reduce those costs still further, Tim Berners-Lee's World Wide Web, developed to transmit physics papers between CERN, SLAC, and other related facilities.
● Information goods typically have very high positive externalities -- they benefit those who don't directly consume them. Occasionally they have high negative externalities -- e.g., smallpox, "superflu", or weapons research.
There's the question of what does a content payment scheme provide? Of which the answer is, generally, "an incentive to create content". What matters isn't whether or not each individual information trasnfer is equitably priced, but whether, at some reasonable interval (e.g., at years' end) you've got sufficient compensation for authors, researchers, reporters, etc., to provide an adequate supply of information and entertainment.
When distribution was on physical media, printing and transactional sales were reasonably appropriate. With the price of reproduction* approaching zero, but nonzero fixed costs of production, there's an inherent conflict in the mechanisms which allow for price discovery in markets.
Right. But it's easy, after reading an article,to decide: this was interesting: I don't need my quarter/dollar back?
Granted most posts are crap, and also probably most of the stuff that's worth a quarter to some are not worth it to most. But publicatuon is free - if one million people doesn't pay for your book, that's fine if 5000 pays an average of 20 USD for it?
A: You must decide, for each piece of data or information you consume,
whether and how much it is worth to you. I'll note that in this discussion, not only did you not pay to read a work, but you couldn't even be bothered to click a link to examine it. Which, actually, I applaud as a rational behavior: the odds of being rewarded with quality content by following an arbitrary link posted on an Internet discussion site by someone who's preferred description is "space alien cat" is fairly low.
But it also rather handily demonstrates the specific failure mode of micropayments.
B: Information is paid out of a general cultural tax, apportioned by
wealth or income. You may access as much (or as little) of the availed information as you choose. Creators are paid according to the access and performance of their works, tracked by one or more monitoring services.
Under the first scheme, there are numerous issues: the rich and poor have vastly different information access, as do children. Researchers who might reference many works (though often only in brief fragments) would have tremendous data charges, as might musicians or authors or photographsers, who typically have large reference libraries of relevant works. Those who don't directly consume information but benefit by its effects on society as a whole pay nothing. Remixes of works would be difficult to arrange given complex rights negotiations.
Under the second scheme, there's no concern at the time of access whether or not the work is worth paying for (though in circumstances where you're accessing physical resources or premeses: a book, a recording, a performance venue) you would still typically pay. Children and the poor would have as much access as any other. Researchers and artists could reference works as needed without concern as to cost. Remixes of works would be straightforward. Payment would be made regularly throughout the year.
At first blush, scheme A describes what we have today, and scheme B is a utopian broadband tax / content syndication scheme. Actually, this is entirely backwards: scheme B is largely the system we have in place today, except that instead of a government-imposed tax, it's one based on advertisers and paid through higher prices for goods purchased regularly throughout the year. Total advertising spending in the United States is $181 billion per year -- $567 per person in 2014.[1] Artists are paid through either ratings-based metrics for music, or according to negotiated television contracts for actors, screenwriters, and such. There's one key difference though: under an advertising-based system, it's ultimately the advertiser who calls the shots on content, and content is geared to maximise advertising-based appeal. This shapes both the types of works produced and the topics covered.
A broadband tax approach changes one element of this: how revenues are collected. It's either through an access provider (your ISP, cable, or broadband service), or through a public tax imposed independently. Allocate some or all of the $567 per-person advertising cost presently collected, and it would be transferred directly to authors, composers, musicians, actors, reporters, researchers, etc. Without the advertising middleman.
Your micropayments scheme requires a middleman and payment processor, trusted by both crators and consumers, some way of providing for refunds, and the somewhat problematic issue that there are limited capabilities to suck out any information you might have acquired but decided after the fact that you weren't interested in actually paying for. At least, without inflicting possible brain damage. How do I keep you from copying my book, or music, or photos, or movie?
First, I didn't realize the first link was by you - I did read it prior to my previous response.
I think you overestimate the complexity of refundable (for "regret" reasons) micropayments. Both Amazon and Google play handle this fine.
I don't see why one would need to go back and get a refund for a quarter or dollar spent weeks ago; have a "pay to access - refunds as wanted (imediatly) after reading. In essence keep payments un escrow for 24-48 hours; with buyer opt-out.
> How do I keep you from copying my book, or music, or photos, or movie?
Why would you care? Go after systematic/for-profit copyright breach through the legal system - enjoy the rest as free publucity? If a link to pay could be embedded (eg: a pay:<content-hash> url scheme - it might be easy (enough) for readers (both in the people sense and application sende) to opt-in to paying. Register the hash in your library when you've decided to pay or refund; have the app commit to paying based on the list that's maintained in your account. The account could be with a payment broker, like itunes/amazon/google - or just a file. In case of a file you'd need to have the/an app look up the hash and follow some instructions for payment.
Your concerns wrt monitoring are valid; but not sure they're worse than what we seem to be moving towards.
[ed: re failure of micropayments - wouldn't someone not buying/paying for something they do not want be a win for micropayments? This would be similar for ads anyway? No view; no ad revenue?
But with ads, I can't get back my "ad view". If I read something (or started to) - and realised this isn't interesting - I could "unspend" a micropayment - let those that like content support it - but stop rewarding "eyeballs". Because that seems like a terrible quality measure (or measure of value-add/price).]
That's not the payment transaction. You are not paying for content with ad views. You're paying when you buy products and services which advertise online.
Why would you care?
Because unlimited recourse to view, then withdraw payment, torpedoes the system.
You've also got the matter that under a syndication system, all views are retail views, regardless of source. Where advertising promotes piracy schemes benefitting publishers at the expense of authors, a Syndication scheme would fairly benefit both.
A "net traffic" based system would eliminate this concern: it simply doesn't matter where your work is served from, so long1as it's served. And no, it's not necessary to measure via privacy-invading mechanisms:
1. Zipf power functions mean that a small number of major sites are the bulk of traffic. You monitor these.
2. You're concerned with served traffic volumes. Other than eliminating suspect traffic, it doesn't matter who is accessing content, only how many. Yes, you've got a views-inflation issue to deal with, there are methods for mitigating that.
3. Sampled traffic estimates are used to model total traffic. That's apportioned across total funds for apportionment.
There's probably going to be some level of bundling, and it may be that specific creators market through specific syndicates, rather than directly. But in general, you're going to end up with payment based on actual access and funded through an indirect, general, tax or fee.
Google and Amazon serve only specific markets. There are large parts of the world with some Internet access but limited payment or finance systems. Credit card fraud is a thing, $11.3 billion worldwide in 2012, up 15% from prior year, and breaches of credit and payment data details are running to the hundreds of millions if not billions[1]. Individually transactionalised online payments are a considerable risk. Ecommerce for all its touted benefits remains a modest fraction of total retail, favoured strongly in B2B space, that is, established relationships and regular transactions.
Also noted that you've failed to address access issues for the poor, children, researchers, and creatives themselves, all of which a general fee would cover.
>Why would you care?
>
> Because unlimited recourse to view, then withdraw payment, torpedoes the system.
This sounds a bit like saying libraries are bad for authors? Why would this torpedo the system? Do we know that people would "freeload" to the extent that the system would break down?
> Also noted that you've failed to address access issues for the poor, children, researchers, and creatives themselves, all of which a general fee would cover.
How so? A child can use a parents account, researchers and creatives can certainly pay? Researchers might get remimbursed of course -- but that's immaterial.
Certainly poor can pay some -- granted, many will not be able to. Let them read for free (by "abusing the refund") -- and start paying when/if they're no longer poor?
I don't see how a flat tax would be any better for the poor, than direct payment? If a person can afford to spend a dollar, or a thousand dollars on content each month -- that doesn't change just because you collect a fee based off of bandwith rather than per-item?
Re: payment fraud -- sure, that's a real problem, and fraud is something every pay system need to account for. I'm not sure that credit cards/micropayments would be more expensive in this regard than tax -- there's plenty of tax fraud too.
I'm mostly concerned with that your model would appear to me to extend the status quo -- where a lot of low quality content get a lot of views, and generate a lot of the profit -- while it'd be better to have a system that promoted more diverse and "better" content -- generally that'd be a slide towards more decentralized publishing/more personal publishing.
Libraries don't allow for unlimited frictionless reproduction. Digital data does.
(Libraries providing content in digital form also do: CDs, DVDs, eBooks).
I don't see how a flat tax would be any better for the poor, than direct payment?
I feel like I'm repeating myself. Oh, because I am. Information is social infrastructure, benefitting all:
● Information goods typically have very high positive externalities -- they benefit those who don't directly consume them.
That is, a tax, particularly one scaled to income:
● Captures the otherwise uncompensated benefit to those who benefit by information even without directly paying for it. Among the benefits I'm referencing is living in an accurately informed public, which requries accurate, relevant, and timely news. Problems with compensation of news media are an ongoing problem. But entertainment matters as well: here's looking at you kid, whether you've paid to watch Casablanca or not.
● Is scaled to income/wealth. Just as the wealthy have benefitted more from society, they're paying on that basis. Dittoes for information.
● Information is raw material. Any denied access is lost social potential and deadweight cost.
your model would appear to me to extend the status quo -- where a lot of low quality content get a lot of views
That is a concern, and something I'm concerned with. It's addressed in a follow-up article (not listed above):
"Some level of price tiering. Seems that there should be a recognition that some content is considerably more cost-intensive than others. I've been pondering how to address this, and don't have a good metric, though assigning either authors or works to specific categories, and setting compensation tiers appropriately, could help. This might also be a way to avoid the Gresham's Law dilemma of crap content driving out good. If crap content has zero (or even negative) prices, then there's no (or at least less) economic incentive to produce it. Rewarding highly complex work based on its inputs would also be of benefit. To be developed but several models exist."
At the very least, you're breaking out of the blockbuster model and restrictions imposed by advertisers or agents on what gets created in the first place. We're already seeing this to some extent now. Sturgeon's Law applies: 90% of everything's crap. But with sufficient filter and recommendations systems, the good stuff can be found (though addressing that is another challenge).
There's also the issue of addressing quality and especially truth/relevance in informational content (news, research results). That's something else I'm kicking around. It's largely independent of content syndication, though possibly not entirely so.
Thank you for your patient and lenghty replies. I still think we disagree - but I think I'm getting a better grasp on your view/ideas.
How do you see taxation/fees working across borders? Clearly agreements can be made, but it seems much more feasible to piggyback on trade agreements than to erect a new international body of law just for digital content? How do money paid by a Chinese factory worker find its way to an American poet? And vice-versa?
I get the sense I'm not selling you on this, but the conversation may end up getting FAQed ;-)
How do you see taxation/fees working across borders?
Good question, and one I hadn't really considered, but I'd divide it out into respective chunks:
1. Let's get back to the goal here: it's to provide sufficient compensation to authors and artists of works distributed in electronic form with a minimal of interference, overhead, control, cencorship, or privacy invasion, while enabling, facilitating, and encouraging open distribution of such works. It's not "build a perfect utility-capture system", and it's not "prevent the least amount of unauthorized / uncompensated use."
To that end, a first-order solution to foreign use is "it doesn't matter.
But let's say that's not quite satisfactory.
2. The total global GDP is presently about $70 trillion. Roughly half of that by nominal GDP is contained within the G7 nations: Canada, France, Germany, Italy, Japan, the UK, and the US.[1] The G20 (20 largest economies) are 85%.[2] The OECD -- 34 nations -- ⅔ of global GDP.[3] Again, Zipf power functions are your friend. This puts the lion's share of interest in a subset of the 206 sovereign states of Earth.
As with other international tariff treaties (postal, telecoms), most are based on bidirectional rights. If your country's producing significant informational goods, you'll have higher incentives to strike a deal with other countries. States which produce little by way of intellectual property would of course have little incentive. Most of these would be developing nations.
Note as well: this scheme would be fully compatible with CC or Free Software/OpenSource licensing. Authors/creators could register and be eligible for compensation.
3. And as with broadcast, your interest is finding a representative population to sample*. The largest providers, servers, or ISPs within a given nation. If this is based on some sort of licensing arrangement, you'd be looking at those parties to set up monitoring and enforcement.
So: first approximation, don't worry about it and treat each nation independently.
Second: scale out among largest economies first.
Third: approach largest broadband / connectivity providers first.
Yep, you pretty much explained why the "paid" web never took off the first place and free web dominated the world.
Ironically, in the media world, cable/network, both models of content distribution market existed, there is public station yet there is HBO. Why "HBO" never took place in the web world is a mystery.
When content was distributed via physical good, the manufacture served as the "value appraiser" to invest in the cost of manufacture. This surely functioned as the price discovery but not ideal as it acts on behalf of consumers.
When the distribution cost down to zero, the value appropriator has now changed to advertiser or marketing campaign organization, but in online world is mostly viral though mostly still driven by underground campaign, have replaced the physical media manufacture/distributor and become the new price appraiser.
No, just the general failure of the pay-to-read model. Other than the New York Times, the Wall Street Journal, and the Economist, few general publications with a paywall make money. They all have large, worldwide reporting staffs.
Nobody is going to pay to read your blog.
Pando Daily is trying pay-to-read. It's too soon to tell how that will work out.
Text is a rather small part of web traffic. Most bandwidth nowadays is used to deliver multimedia content (music and movies), much of it in real time.
Several people have written articles about how the Internet has been and is being changed to work as a more efficient video distribution network, which is a long way from the original idea.....
Some of that is paid for by subscriptions (eg Netflix, Spotify) though I assume the vast majority is -- like most radio and television -- paid for by advertising.
There absolutely should be distributed short-form knowledge. Only metadata needs to be exposed in order to find small facts. I don't think there is any need for payment per-datum for such knowledge. This kind of static content is simple to trust as well.
Long-form knowledge, in contrast, still has massive value and many channels exist for distribution and consumption already.
If we base the web of trust on facts combined with people then we can walk the short web and exit at the long-form.
However nobody wants to create a web of trust based on a system that encourages selfies and ephemeral knowledge -- we've tried that. And this is where things get interesting.
I'm all for this, and consider it to be inevitable in the long run. In the short term, however, it seems like the major hurdle will be getting one of these projects into the mainstream: for the most part, the web already does what most people want it to do, and those people aren't going to be bothered to install a new web browser so that they can do things they're already doing. Especially if it lacks the features, performance, or ease-of-use of their current browser.
So, how do we address this? Is there a "killer app" for the distributed web that will motivate people to move to it? Can we use existing web tech like Web-RTC to bootstrap the system? Maybe a workable avenue is mobile, where people are pretty comfortable installing new applications - what if we built the next social network into an app based on the distributed web?
I don't know the answer, but I'd love to hear any ideas/brainstorming you clever people have to offer.
Implementing other protocol handlers into existing web browsers and operating systems would be a good start IMO.
Taking ipfs/ipns [1] as an example, having handlers inside web browsers would allow people to link from http[s]:// to ipfs:// and vice versa in a seamless way, lowering the barrier to migrate.
From there on, there's nothing preventing you from distributing your application code (html/css/javascript/whathaveyou) over ipfs, and make use of WebRTC for user-to-user interactions.
Obviously http[s] is going to stick around for a while as it has its use cases (basically anything that deals with a centralized service, from online banking to search engines to apis), but having a secondary, peer to peer means of distributing content and applications would be a major plus.
[1] http://ipfs.io (they have a working implementation in Go with an HTTP gateway as well as a FUSE filesystem)
That's been the goal of the Freenet project for a while, to build a distributed encrypted network protocol. It distributes storage and processing which is why full encryption is necessary; you don't want 10 people reading your email when it's distributed across their machines.
The challenge for Freenet has been speed and fun. To have something like Facebook you have to download a JAR plugin for Freenet that adds that capability. That's not fun. The speed is slow because of the encryption and constant syncing.
It might be better to look at the MediaGoblin and Pump.io (and StatusNet to some extent) for ideas on federated platforms. The challenge there again is fun; it isn't fun to set things up.
Am I the only one who is scared to even try things like Freenet and Tor, for the risk that someone will somehow transmit illegal content across my connection? I'm not talking about content piracy, but the types of things that I don't even want to type for fear of having the terms associated with my username.
You aren't the only one, but with Freenet it's fully encrypted. Let's say you had a Freenet Silk Road application. You won't know it's a Silk Road web page that's being saved along with images of marijuana to your computer unless you go through an indexer/search site and even then you still won't know that those bits of data are stored specifically on your machine.
So in order for the cops to know your machine was used to store the drug listings, the cops would have to spy on your machine and crack the encryption of the Freenet protocol and essentially monitor it. This is why undercover work is important to the police. If no one reports you for the crime of buying drugs and no one discovers the drugs in transit, then the police don't know what's happening. The only way to catch mobsters was through some undercover work and hoping that someone in the criminal network would squeal. If one criminal says the other 10 criminals actually had a hand in committing a crime, the police have more to investigate and can build a case.
If you're not buying drugs or selling drugs and the data related to the drug listings is encrypted when stored on your machine and encrypted when served from your machine, you may be unknowingly helping a criminal to buy or sell drugs. But I'm not sure how that's discoverable by the police and I'm not sure how it would be turned into a criminal investigation. By the argument that you're allowing this, then all ISPs and cell providers are in big trouble because they also enable drug dealing.
There are some horror stories for Tor node operators though.
It's important to note that most of the horror stories involving Tor nodes are related to "exit" nodes. These are the nodes that bridge the open internet with the Tor network. As such, you can see what traffic is traveling through them much more easily.
It's not recommended to run exit nodes if you don't know what you are doing and have a fair bit of resources (time/money) to spend on it. Without exit nodes Tor doesn't work, but they are risky to run because you can be held liable for the content traversing the machine.
Not talking about the legal aspects, but, (and I really, really hate to bring up the "think about the children!" argument here) what about if I am unknowingly helping people who create and share child porn?
It doesn't matter (to me) if I am on the hook for it or not, I just don't know (ethically?) how I would feel if I knew that was going on via my PC. Drugs I don't give a shit about, and I hate how the "think about the children" people screw our rights to privacy, but still...
Honest open question.
Edit: PS, I want you guys to keep doing what you're doing. I completely believe in an open free web, and I want to play my part... I hate the idea of the open web turning into a bunch of mini AOLs... Which is where we seem to be heading at the moment.
I think an important realization to make, is that you can't fully stop behaviour that you find unethical. If somebody has an incentive for doing it, then it's going to happen, somewhere, somehow, in some way.
Therefore, the equation changes - it's not about what least accomodates those with (in your view) unethical behaviour, but about what most accomodates those with ethical behaviour.
That is why highways and Tor make sense, from an ethical point of view, despite them being used for things you ethically disagree with - because those things would happen regardless (there's incentive after all), and you're simply making ethical behaviour easier.
A similar equation applies to DRM, actually, and to why it doesn't and can't work. Those with 'bad' intentions (ie. pirates) have the incentive to break it anyway - financial incentive for commercial pirates, "for the fun of it" for non-commercial pirates.
Your actual customers, however, don't have that incentive, and to them it's an insurmountable wall that they can't get over, even though all they wanted to do was fix a bug that you as a vendor hadn't had time to follow up on yet.
Not using DRM wouldn't change anything about the 'unethical' behaviour - they were going to pirate it anyway - but it would make things better for those with 'ethical' behaviour.
I don't know. I don't want to fully stop pedophiles, as I know that allows for draconian measures which hurt everyone. I know it will happen - always has, always will - but I don't want to automatically make it easier for peds. At least now they have to try hard to be safe, therefore (and I may be completely wrong here) the borderline wannabe pedophile won't go fully fledged, because there would be hurdles. And at the same time I realise that this thinking me that's normal peeps who just want some modicum of privacy also have to jump through hoops, which is why this whole thing is a problem for me.
Also, I think the whole drm analogy doesn't work in this case, though I do see where your coming from.
With my problem, the equivalent would be if adding drm made it automatically easier and safer for pirates, not just crippling legit users...
Maybe think of it this way, if you were a construction worker and helped build an interstate, you'd be helping all sorts of criminals do all sorts of horrible things. Plus innocent people would die in car crashes on the road you helped to build. But we can agree that the utility of an interstate far outweighs these drawbacks.
I get the analogy, but it just doesn't quite... fit this problem for some reason (maybe because there are a lot of crimes I feel aren't crimes, and crashes are just random fate most of the time).
Plus I could think of numerous ways to argue against the analogy that would literally make me sound like a "think of the children!" person...
Maybe I just need to think more about my internal attitudes and justifications for certain things in a more critical, rational way.
When internet first appeared many years ago, it was labelled the "bad" thing too, still is in many parts of world. The "think of ..." is not new and has always been in human history every time there is something new.
Back then, Internet literally made many bad things easier and harder for law and law had to adapt.
You get the point. Why do you think internet today is going back to Aol, technology is always a result of culture.
With a fully distributed model, everyone is essentially running a backbone server. If you don't feel comfortable with such an arrangement, then you'd probably have to opt out. There are plenty of people willing to put up with it, evidenced by the number of companies who have no problem operating the current Internet backbone despite knowing for a fact that their networks are used to distribute child porn and other illegal things. Generally, I think the law is on the side of the distributor, but that's only a legal consolation, not a moral one. I just don't see a way around it. To be fair though, the likelihood of this happening is probably going to be much lower than for a backbone provider, especially if users only serve up content they've consumed themselves (seems like a logical assumption in the distributed system I'm thinking about, but does not have to hold true for all such systems).
I do feel comfortable with being a part of a global backbone server, and you're right, I don't see a way around the moral issue. But I do see a light at the end of the tunnel in the idea of serving what I consume, that is an interesting take on it. But that does also bring up the issue of creating walled gardens (ie niche bubbles) that may not foster the type of Internet I want to see in the future...
For me it's a thorny issue just because of recent articles I've read about the anonymous services really allowing molesters to even set up 'dating' services. As the post above argues about utility outweighing the bad, I want the future great possibilities of an unhindered Web, but I don't want the unheard of new ways people can abuse children. (I've stated before that there are a lot of crimes I don't care about too strongly, but this area gets me, mainly because I have small kids right now)
Perhaps better to take the economical perspective than moral perspective. The coin miner is the backbone of bitcoin, or where all the resource of encryption needed come from, by moral supporters?
Any distributed web attempt must learn from bitcoin and understand the economic ecosystem is required and essential. It beats all pointless endless arguments and find it way to success.
The burden of supporting free speech is supporting speech that you may disagree with. Reputable civil liberties organizations like the ACLU regularly fight to protect the rights of groups (KKK[0], neo-Nazis[1], etc.) that they disagree with. You do not have to support the content of someone's speech to support their right to free speech.
I am completely fine with free speech. Even though I hate neo-nazis with a passion, I would also stand in line to allow them to speak freely. But I have had many a bruise from fighting the neos when their speech moves to action. And I'm not talking about being an antifa who goes to protests to fight neo-nazis, but when I've seen them beating on people of colour I've jumped right in and fought them tooth and nail. (And because I'm of mixed race myself, I've had to defend my own butt on more than a few occasions)
Back to my original problem, I wouldn't want to ban child porn fiction, or such similar speech, but I would similarly fight tooth and nail to stop pedophiles swapping children amongst themselves with no repercussions. It's enabling the physical actions I have a problem with - and I'm pretty sure the ACLU doesn't defend KKK members who have actually perpetrated a hate crime.
It's a fine line, I know, but to me it's definitely an iron line...
It all reduces down to this: tech X can enable Y evil things so should we allow X to exist? Interstates were a good example as traffickers depend heavily on them. Cameras are vital to spread of child porn online. Printing presses and printers if not that. Pencils if it gets down to sketching. They're even doing animated stuff according to some sources: there goes any non-locked-down video game system. Traffickers also use cars, 18 wheelers, trains, and boats.
The tool/tech X can aid evil Y is weak because the good from X far outweighs Y. There's always an evil minority. Obsessive people motivated by control or money will find a way to act on those motivations. They'll use whatever tools are at their disposal. We shouldn't stop having freedom, privacy, Internet anonymity, etc w/ all the benefits because an abuse might happen. That you possessed or sold a technology someone might have abused is on them, not you. And they're pretty good at getting away with it even without anonymity tools or crypto.
But, believe me, most Americans are happy to abuse the hell out of children with tech so long as it's the foreign children building America's tech for 10-12 hr straight at horrid wages and conditions to make that tech cheaper. ;) What you worry about is at record lows and what I just mentioned is at record highs. American double standard, just them worrying about the wrong things, or both?
I want this for all the reasons they list, but it seems there are huge unanswered questions for anything beyond a permission-less static page. Imagine you are developing a modern web app in the locked open paradigm. Is all system data distributed, including private user data and passwords? The only solution I can come up with is homomorphic encryption, which is not performant enough and still probably leaves a huge timing/structure analysis attack area if anyone can download the database. If I make any mistakes on the database security, the entire DB is already pre-leaked to the world? The final dencryption/encryption happens in client javascript, which is a whole other hornets' nest. Besides that, the implication is that I write my entire system stack in client javascript that is exposed to everyone, including any proprietary algorithms or credentials? Even if that was ok, and the system can live in the user cloud, where does system processing that is independent of user activity (scheduled tasks, etc) happen? Again, I want all of these problems to be solved, but they are nontrivial.
"homomorphic encryption, which is not performant enough"
It is fast enough on a per viewer basis, and in a DHT downloading the database doesn't mean it was all encrypted w/ one key. Each user encrypts his data as needed, or common groups of users encrypt data for each other with each others keys.
"If I make any mistakes on the database security"
This is why encryption is the underpinning. Sure you can still leak your private key like you can leak an SSH key today.
"in client javascript"
Nobody would use a distributed network where this was the case. In many cases (i.e. MaidSafe) they are developing a browser plugin for client side to communicate with the backend.
"where does system processing that is independent of user activity (scheduled tasks, etc) happen?"
Many of these now-being-designed systems have a pay-for-computing concept. Granted several (not all, unless you want to be limited by a single-file-line blockchain forever) have to agree on the results. Give some computing for other computes and get some. As for "scheduled task" timing issues are inherently difficult for these systems and I don't expect the "system" to trigger a job but rather a user to trigger it. Introducing timing into these distributed networks can be hairy.
The real problem that needs to be tackled is a way for the common human to hold his private key in his memory or some other non-digitally-retrievable way.
Thank you for the thoughtful responses! I am still getting my head around some of this, so I love hearing solutions I have not thought of.
"common groups of users encrypt data for each other with each others keys"
I agree, but I think this can quickly lead to massive multiplication of data without careful cryptographic gymnastics. It puts more pressure on the application devs to do it right or more pressure on the network in terms of data if you don't.
"Sure you can still leak your private key like you can leak an SSH key today."
If I leak an SSH key, I can revoke it and only data that attackers have already grabbed is out. In the described paradigm, everything is already out to everyone. It is all or nothing. That might not be a difference from a theoretical point of view, but in practice it is.
MaidSafe is very interesting, thank you! It seems like more of a shared cloud, which is halfway between present cloud computing and the completely distributed utopia described in the article. It solves pretty much all of these issues, with the cost of being a less-centralized network rather than a fully distributed network. Awesome work, I hope they succeed!
You can also change any sensitive data you have. Also, the distributed/open web should not be one without moderation, just without mandated moderation. If I wrote a distributed social network, I would allow the user to choose a moderated "room"/"group" if he wished. This can facilitate deletion of items, but in many distributed systems, they are never deleted anyways. Be it a mostly immutable DHT or the "right to be forgotten" or whatever it is, in decentralized systems you cannot tell people what to do with data you put out there, you can only encrypt it. IMO, we'll still need the public auditable web for acts requiring responsibility for security failures. Users cannot be trusted with their own security nor can they be trusted to determine a bad actor from a good one.
MaidSafe is fully distributed. Each user is a node (i.e. "vault" or "persona" or whatever the proper name is).
One process is a proxy that can also coordinate multiple users editing the same page and a subprocess acts as a DHT node.
You can use a raft-like log of hashes of pubkey,content and the previous hash to keep a history of edits in the network.
the hard part is this: How do you trust the validity of a singular node having a url you're requesting?
It entails a rating system, and then it becomes the byzantine generals problem where the overlay network should be able to tolerate up to a third of its malicious nodes saying they're all trustworthy.
So the "log of hashes of pubkey,content and the previous hash" is conceptually similar to a blockchain, and I think reading into how that works (consensus, trust, etc) would give you some insight into the issues you're describing. You may also find the IPFS project of interest: http://ipfs.io/
I've found IPFS very interesting and reccommended it to peers, but it lacks the collaborative editing aspect.
Trusting the initial public offering of a resource is still an interesting issue. IPFS is content-addressable by hash, addresses map to their content in a computable way.
The idea for the distributed hash table in Uroko is that the keys are existing URLs. Imagine thousands of peers all saying they have a new page on the domain "google.com" and you can see what makes this a fun problem to solve.
> Trusting the initial public offering of a resource is still an interesting issue.
As you mention, resources on IPFS are addressed by the hash, so I'm curious what you mean by "trust" here - do you mean that you can't trust the accuracy/validity of the content? I would assume that content on this kind of network is signed by the publishing party, so if the signature checks out against your PKI, you can trust the content.
I'm also curious as to why Git (or a spiritually similar adaptation) doesn't fit the needs of what you have in mind. Come to think of it, I don't think I see the use-case for Uroko - would you mind explaining?
Yes and thank you for your question. I mean the accuracy of the content. As users of a web that's been embedded in a distributed hash table where URLs are the keys and revisions of content are possible values, we will want some peace of mind if an organised party configured nodes to insert advertisements, for example. This means all nodes have an altruism score associated with their public key, and the system being designed to help a node perform a distributed summation of a node's altruism score, producing a positive or negative total. It means having some way of verifying if content you received was good, and some way of rating your act of rating, based on your altruism score.
Another thing is that for obvious reasons initial tests of the network ought to redact all <script> tags from documents before they're ever sent to a browser. It also means manually implementing same-origin policy due to the proxy address being the origin of every script being served to you.
Uroko intends to be a spiritually similar adaptation of git. If you look at the models.py the concept is based on revisions that belong to a path, which belongs to a domain. Think of them as commits on a branch belonging to a project.
So it is a spiritually similar adaptation of Git, but Git isn't an overlay network, which gives us an addressing scheme to identify nodes independent of their ipv4 or ipv6 addresses (Kademlia gives a routing scheme with a possible 2^160 node IDs..), message rebroadcasts, pings, helping peers bootstrap in and transmitting peers you know of in every message, tolerating node failures and ensuring no one is left out of what is an ad-hoc system built on an ad-hoc peering arrangement has been demonstrably well served by the use of this sort of overlay network.
Also Git is not an HTTPD. Uroko is, and a design goal is to support users simultaneously collaborating on the same document in soft real-time. You should be able to synchronise your cache of the web with friends directly, edit over the lan/vlan, and generally keep popular sites available to nodes in your overlay routing table.
I would love to live in this future -- but where's the incentive for businesses? How do they make more money developing in this way? How do users get more value accessing sites developed in a purely decentralized fashion? How do we avoid JavaScript being the basis for all of this?
Interesting (almost exciting) vision, but I don't see why the majority of existing users would move. They just don't get much value out of privacy, versioning, reliability, etc. They get enough of those things out of Gmail, Facebook, et al for their purposes already.
Who cares? The web preceded the Internet company. I remember, it was a great place full of interesting people and things, not shitty content mills crammed full of ads. A system like the one described means that ordinary people can build the web again, which is great. Businesses can continue as they are.
Ordinary people can and still do build the web. Why would decentralization allow for more of this? It's cheaper and easier than ever to spin up a heroku/do/aws/google/azure instance and put up a website. Things like squarespace and weebly even make it so you don't have to do any programming whatsoever.
In theory I want this decentralized web stuff to succeed but in practice the only killer apps I see are overthrowing governments and kiddy porn. I'd be happy to be proven wrong. From where I stand, decentralization seems like more of a social/product problem than a technical one. If you prove there's a product that end-users want that can't be built or accessed from the current web, people (end-users and developers) will switch.
The interests of society do not revolve around making life as convenient as possible for business. It is the onus on business to adapt itself to new conditions.
A distributed web scheme should orient itself around giving its users freedom. All other concerns are secondary.
This may be philosophically correct. But in a non platonic realm you must somehow fund these sorts of things. If it's not possible to make money on them then it may not be feasible to implement for the masses. The keyword being masses there. There all kinds of distributed p2p web projects out there. None of them appeal to a majority though they are all niche products.
Most of these proposals forget the need fund marketing, promotion, and scale.
Wikipedia is one of the best sites on the web and didn't think in this way. The whole point of such a system is to remove capital requirements (hardware) from the process of building massive websites. It inherently means less funding is required.
Hardware is only a tiny, tiny part of the cost of developing a system for general use by huge numbers of people. If you don't believe me, do a price check on cloud hosting and then keep in mind that cloud hosting must be profitable so the prices you're seeing have some non-trivial margin attached.
Software development isn't free. Sure, OSS developers can donate their time, but there's a selection process that comes into play.
OSS developers do what they do partly because they enjoy it, and so they tend to gravitate to the types of development that is fun. Fun stuff includes deep systems stuff, algorithms, distributed systems, the flashier aspects of UI/UX, machine learning, etc.
Fixing stupid bugs that stem from stupid compatibility issues and adding stupid features for stupid use cases is not fun. Debugging edge cases that afflict 1% of your users occasionally is not fun. Porting to popular but crap platforms is not fun. Supporting legacy platforms and APIs is not fun, nor is maintaining backward compatibility. Accessibility features and translations are not fun. Supporting right-to-left language is not fun. Rewriting your entire already-working app to support the next web fad (e.g. "responsive mobile design") is not fun.
I could go on forever. There is a really really really long tail of these horrors.
This is why OSS rules in the systems/algorithms/etc. space but drools when you get close to the user. This is why every major end user OS, site, or platform is commercial. People have to be paid to torture themselves with that stuff.
Yet these sorts of problems are precisely the ones that make the difference between something only geeks (with time on their hands) would want to use and something regular people who aren't computer experts would want to use (or computer experts without time on their hands).
For a distributed, decentralized system to challenge the silos of the web, it would absolutely need funding. Eliminating hardware and bandwidth costs is easy; eliminating HR costs is not.
There will never be a volunteer-developed mainstream platform for the same reason there are no volunteer-developed mainstream anything elses. To make a truly polished product of any kind requires pain.
Disclosure: I currently run a distributed networking / SDN startup. The core technology has been up and running for years with few modifications. It started as a side project so I haven't kept careful account, but I'd easily estimate that upwards of 90% of the development time spent on this project has been on trying to get it to the point that it can install and run trouble-free on multiple platforms and is easy enough for mere mortals to use. Getting the core plumbing, crypto, etc. working was the most fun and the most intellectually challenging, but it was a huge minority of time spent. Had there been no chance of a commercial application I would have stopped there and it'd be yet another piece of interesting GitHub networking orphanware, because for the love of God who would voluntarily try to port such a thing to Windows?!? let alone Android (It's in C, so enjoy your JNI pain).
Clearly there are many major free software projects which are formed as NPOs and receive adequate levels of funding. These may further be improved through use of quadratic base-level pledging, crowdfunding, consulting work, academic grants and so forth.
Marketing and promotion are perfectly tractable. I do not see how appeasing to interests of proprietary service companies will suddenly have the funds funneling in, in any case. Not sure what is meant by "scale".
I think it's obvious that the current web is decentralized, but is heavily server-based. At the same time, there is something about propagating applications across these servers... Russia can ban Reddit but they can't ban Wordpress. For the moment, that is what we are working on at http://platform.qbix.com (and have been for the past 4 years). Making it easy to have a distributed social network the same way bitcoin makes money distributed.
Now, how would you take it further and make the web entirely peer to peer, so you wouldn't have to trust servers with your security and politics? You can have additional schemes like http and https, for various methods of delivery and storage.
That would be an easy first step, that would do a lot. It's 2015 and we can't even have XAuth (http://techcrunch.com/2010/04/18/spearheaded-by-meebo-xauth-...) in the browser! (We would need a space for storing preferences where websites from any domain could read what was written.)
The nicest thing about that all is that I don't need to wait until someone else writes a whole new WWW, my own website already is a small part of the whole big thing, I just make my HTML more machine readable and implement something like pingback (but easier, it is called webmentions). With this small building blocks I am, together with others, building a social network which we don't even need to call that.
IndieWeb looks great. I am going to try to get involved with it.
Could you please get in touch with me by email? You can find it at http://qbix.com/about -- just click on "contact". I would like to find out more about this movement ... I'm beginning to participate more in the Offline First, Distributed Web, Mesh Networking and other such movements. Our company's spent 4 years building a platform that would decentralize social networking, because we see it as the catalyst to giving users control of their own data. Most people in the world are just using centralized services these days, and it's directly related to how difficult it is to make a seamless social layer for the web. So I think that we're solving a solution parallel to what bitcoin did with money. A good solution unleashes new possibilities, like the Web itself did, like Email did.
I really wish that the Internet Archive would provide bulk access to the Wayback Machine dataset. It would allow for a lot of interesting experimentation and research.
Is that even possible? I don't know the latest size of the IA, but it must be ridiculously huge by now, (1 billion pages a week added) bandwidth cost would be massive.
Maybe they could offer a mail-us-a-multi-petabyte-hdd service... Returned a few weeks later full of data :)
It's totally possible, they already have the infrastructure in place and 14PB of data available for download. Unfortunately the Wayback Machine data is not currently exposed publicly.
Just replacing DNS with a decentralised alternative would be a big step, yet appeared to be an impossible one (zookos triangle).
Something this big requires everyone using the internet to switch to the new system or it will never work, and that will never happen. It's the dancing pig problem.
We can't go back on the decisions that have been made, only go forward.
Data synchronization and Memory management is a major flaw in the concept of a distributed web as described by this article. Is the author suggesting taking all application data that exists on all web servers today, and hosting it on each device connected to the network (billions of devices) ?
I was liking what the article was suggesting but then it shamelessly plugged BitTorrent inc. which is one of those 'big companies' you don't want touching anything related to privacy or freedom.
It's too late for that. What was once the internet is now basically glorified cable TV. At this point, it's pretty much inevitable that it's going full Disney or bust.
Cool, that's a nice way of promoting those technologies since many don't understand them.
I wish those things would land on the IETF board. I wonder what snowden think about those. I would surely make things much harder for the NSA to do massive surveillance.
A good roadmap for a new distributed web should be broken down by OSI model layer, showing what protocols and technologies exist that need to be replaced, what levels of the OSI model they span, and identifies single points of failure lower in the stack that must be accommodated. Too few people understand how brittle the web is by its reliance on the "magical" underpinnings of the Internet continuing to "just work".
For example, let's say we want privacy, anonymity and high availability for something fundamental like name lookups. It's not enough to simply replace DNS with namecoin (L7), if there's a critical vulnerability in openssl on linux that could force a fork in the network, possibly leading to existing blocks getting orphaned (L6), if every single session that goes through AT&T gets captured, and the corresponding netflow stored in perpetuity for later analysis and deanonymization (L5), if this application's traffic could be used for reflection amplification attacks (L4) due to host address spoofing (L3). One might try to get around those issues by direct transmission of traffic between network endpoints (asynchronous peer-to-peer ad hoc wireless networks via smartphones or home radio beacons, for example), but then you not only need to deal with MAC address spoofing and VLAN circumvention, (L2) but with radio signal interference from all the noisy radios turned up to max broadcast volume, shouting over one another, trying to be heard (L1) and accomplishing little more than forcing TCP retransmissions higher up in the stack.
And really what's the point, when you can't even trust that the physical radios in your phone or modem aren't themselves vulnerable to their fundamentally insecure baseband processor and its proprietary OS? Turns out, what you were relying on to be "just a radio" has its own CPU and operating system with their own vulnerabilities.
Solving this from the top down with a "killer app" is impossible without addressing each layer of the protocol stack. Each layer in the network ecosystem is under constant attack. Every component is itself vulnerable to weaknesses in all the layers above and below it. Vulnerabilities in the top layers can be used to saturate and overwhelm the bottom layers (like when Wordpress sites are used to commit HTTP reflection and amplification attacks), and vulnerabilities in the lower layers can be used to subvert, expose, and undermine the workings of the layers above them. The stuff in the middle (switches) are under constant threat of misuse from weaknesses both above AND below.
It might be tempting for an app developer to read this blog post and think "Oh wow, what a novel idea! Why is nobody doing this?" But in reality, legions of security and network researchers, as well as system, network, and software engineers around the world toil daily to uncover and address the core vulnerabilities that hinder these sorts of efforts.
I've participated in some file-sharing litigation which has made it very clear to me that decentralized P2P systems are not inherently more anonymous than other technologies. In fact, there's a cottage industry of P2P monitoring companies that participate as peers in the P2P networks and record detailed information about the IP addresses of peers that uploaded and downloaded particular files. There are often paradoxes where decentralization helps privacy and anonymity in some ways but harms it in others -- for example, if you run your own mail server instead of using Gmail, then you've prevented Google from knowing who communicates with whom, but allowed a network adversary to learn that information directly, where the network adversary might not know the messaging relationships if everyone on the network used Gmail.
I guess a related point is that information about who is doing what online exists somewhere by default, unless careful privacy engineering reduces the amount of information that's out there. Making the simplest kinds of architectural changes could just shift the location where the information exists, for example from Google or Yahoo or Amazon to dozens of random strangers, some of whom might be working for an adversary.