"It pains me to say this, but I think that differentiating humans from bots on the web is a lost cause."
Ah, but this isn't doing that. All this is doing is raising friction. Taking web pages from 0.00000001 cents to load to 0.001 at scale is a huge shift for people who just want to slurp up the world, yet for most human users, the cost is lost in the noise.
All this really does is bring the costs into some sort of alignment. Right now it is too cheap to access web pages that may be expensive to generate. Maybe the page has a lot of nontrivial calculations to run. Maybe the server is just overwhelmed by the sheer size of the scraping swarm and the resulting asymmetry of a huge corporation on one side and a $5/month server on the other. A proof-of-work system doesn't change the server's costs much but now if you want to scrape the entire site you're going to have to pay. You may not have to pay the site owner, but you will have to pay.
If you want to prevent bots from accessing a page that it really wants to access, that's another problem. But, that really is a different problem. The problem this solves is people using small amounts of resources to wholesale scrape entire sites that take a lot of resources to provide, and if implemented at scale, would pretty much solve that problem.
It's not a perfect solution, but no such thing is on the table anyhow. "Raising friction" doesn't mean that bots can't get past it. But it will mean they're going to have to be much more selective about what they do. Even the biggest server farms need to think twice about suddenly dedicating hundreds of times more resources to just doing proof-of-work.
It's an interesting economic problem... the web's relationship to search engines has been fraying slowly but surely for decades now. Widespread deployment of this sort of technology is potentially a doom scenario for them, as well as AI. Is AI the harbinger of the scrapers extracting so much from the web that the web finally finds it economically efficient to strike back and try to normalize the relationship?
> Taking web pages from 0.00000001 cents to load to 0.001 at scale is a huge shift for people who just want to slurp up the world, yet for most human users, the cost is lost in the noise.
If you're going to needlessly waste my CPU cycles, please at least do some mining and donate it to charity.
Anubis author here. Tell me what I'm missing to implement protein folding without having to download gigabytes of scientific data to random people's browsers and I'll implement it today.
What if I turn off my computer? Does the client save its work (i.e. checkpoint)?
> Periodically, the core writes data to your hard disk so that if you stop the client, it can resume processing that WU from some point other than the very beginning. With the Tinker core, this happens at the end of every frame. With the Gromacs core, these checkpoints can happen almost anywhere and they are not tied to the data recorded in the results. Initially, this was set to every 1% of a WU (like 100 frames in Tinker) and then a timed checkpoint was added every 15 minutes, so that on a slow machine, you never lose more that 15 minutes work.
> Starting in the 4.x version of the client, you can set the 15 minute default to another value (3-30 minutes).
caveat: I have no idea how much data "1 frame" is.
You can't do anything useful with checkpoints due to the same-site origin problem. Unless you can get browser support for some sort of proof of work that did something useful that whole line is a non-starter. No single origin involves a useful amount of work.
The problem is that this problem is going to be all overhead. If you sit down and calmly work out the real numbers, trying to distribute computations to a whole bunch of consumer-grade devices, where you can probably only use one core for maybe two seconds at a time a few times an hour, you end up with it being cheaper to just run the computation yourself. My home gaming PC gets 16 CPU-hours per hour, or 56700 CPU-seconds. (Maybe less if you want to deduct a hyperthreading penalty but it doesn't change the numbers that much.) Call it 15,000 people needing to run 3-ish of these 2-second computations, plus coordination costs, plus serving whatever data goes with the computation, plus infrastructure for tracking all that and presumably serving, plus if you're doing something non-trivial a quite non-trivial portion of that "2 seconds" I'm shaving off for doing work will be wasted setting it up and then throwing it away. The math just doesn't work very well. Flat-out malware trying to do this on the web never really worked out all that well, adding the constraint of doing it politely and in such small pieces doesn't work.
And that's ignoring things like you need to be able to prove-the-work for very small chunks. Basically not a practically solvable problem, barring a real stroke of genius somewhere.
Ah, but this isn't doing that. All this is doing is raising friction. Taking web pages from 0.00000001 cents to load to 0.001 at scale is a huge shift for people who just want to slurp up the world, yet for most human users, the cost is lost in the noise.
All this really does is bring the costs into some sort of alignment. Right now it is too cheap to access web pages that may be expensive to generate. Maybe the page has a lot of nontrivial calculations to run. Maybe the server is just overwhelmed by the sheer size of the scraping swarm and the resulting asymmetry of a huge corporation on one side and a $5/month server on the other. A proof-of-work system doesn't change the server's costs much but now if you want to scrape the entire site you're going to have to pay. You may not have to pay the site owner, but you will have to pay.
If you want to prevent bots from accessing a page that it really wants to access, that's another problem. But, that really is a different problem. The problem this solves is people using small amounts of resources to wholesale scrape entire sites that take a lot of resources to provide, and if implemented at scale, would pretty much solve that problem.
It's not a perfect solution, but no such thing is on the table anyhow. "Raising friction" doesn't mean that bots can't get past it. But it will mean they're going to have to be much more selective about what they do. Even the biggest server farms need to think twice about suddenly dedicating hundreds of times more resources to just doing proof-of-work.
It's an interesting economic problem... the web's relationship to search engines has been fraying slowly but surely for decades now. Widespread deployment of this sort of technology is potentially a doom scenario for them, as well as AI. Is AI the harbinger of the scrapers extracting so much from the web that the web finally finds it economically efficient to strike back and try to normalize the relationship?