But you know the website might have sooper sekret information they want to protect, which is why it's been published on a public website.
Speaking of bullshit restrictions designed to encourage compliance with surveillance, have imgur links just straight up stopped working for anyone else recently? I'm coming from a datacenter IP. I assume it's just some heavy handed part of the cost cutting push they announced.
Verification isn't about keeping secrets, obviously, it's about restricting the velocity of bots and their ability (intentional or not) to degrade your site's performance/availability.
There are too many bots out there that are very inconsiderate and do not limit or throttle themselves.
We have one right now that crawls every single webpage (and we have 10's of thousands) every couple days, without any throttle or limit. It's likely somebody's toy scraper, and currently it's doing no harm, but not everyone has the server resources we have.
The point is - if you are dealing with inconsiderate bots, a captcha of some type is pretty nearly a bullet proof way to stop them.
With that said, Cloudflare usually is smart enough to detect unusual patterns, and present a challenge to only those who they believe are bots or up to no good. If every person gets a challenge, then the website operator is either experiencing an active attack, or has accidentally set their security configuration too high.
I do know the common narrative. FUD -> more snake oil "solutions". I myself rely on a special type of igneous rock that keeps hackers away. In reality:
1. Most sites only have this problem due to inefficient design. You are literally complaining about handling 1 request every 2 seconds! That's like a "C10μ problem."
2. How many IPs are these bots coming from? Rate limiting per source IP wouldn't be nearly as intrusive.
3. There are much less obtrusive ways of imposing resource requirements on a requester, like say a computational challenge.
> You are literally complaining about handling 1 request every 2 seconds
I don't know where this came from. The inconsiderate bots tend to flood your server, likely someone doing some sort of naïve parallel crawl. Not every website has a full-stack in-house team behind it to implement custom server-side throttles and what-not either.
However, like I mentioned already, if every single visitor is getting the challenge, then either the site is experiencing an attack right now, or the operator has the security settings set too high. Some commonly-targeted websites seem to keep security settings high even when not actively experiencing an attack. To those operators, remaining online is more important than slightly annoying some small subset of visitors 1 time.
> crawls every single webpage (and we have 10's of thousands) every couple days
100,000 / (86400 * 2) = 0.58 req/sec.
I acknowledge that those requests are likely bursty, but you were complaining as if the total amount was the problem. If the instantaneous request rate is the actual problem, you should be able to throttle on that, no?
I can totally believe your site has a bunch of accidental complexity that is harder to fix than just pragmatically hassling users. But it'd be better for everyone if this were acknowledged explicitly rather than talked about as an inescapable facet of the web.
> But if the instantaneous request rate is the problem, you should be able to filter on that, no?
Again, not every website is the same, and not every website has a huge team behind it to deal with this stuff. Spending 30-something developer hours implementing custom rate limiting and throttling, distributed caching, gateways, etc is absurd for probably 99% of websites.
You can pay Cloudflare $0.00 and get good enough protection without spending a second thinking about the problem. That is why you see it commonly...
If your website does not directly generate money for you or your business, then sinking a bunch of resources into it is silly. You will likely never experience this sort of challenge on an ecommerce site, for instance... but a blog or forum? Absolutely.
Actually I get hassled all the time on various ecommerce sites. Because once centralizing entities make an easy to check "even moar security" box, people tend to check it lest they get blamed for not doing so. And then it gets stuck on since the legitimate users that closed the page out of frustration surely get counted in the "attackers protected against" metric!
I'd say you're really discounting the amount of hassle people get from these challenges. Some sites hassle users every visit. Some hassle users every few pages. Some hassle logged in users. Some just go into loops (as in OP). Some don't even pop up a challenge and straight up deny based on IP address!
And since we're talking about abstract design, why can't Cloudflare et al change their implementations to throttle based on individual IPs, rather than blanket discriminating against more secure users? Maybe you personally have taken the best option available to you. But that doesn't imply the larger dynamic is justifiable.
> why can't Cloudflare et al change their implementations to throttle based on individual IPs, rather than blanket discriminating against more secure users
Cloudflare does not do this - I've made that point several times. The website operator either has the security setting cranked to a paranoid level (which is not the default, btw), or they are experiencing an attack. Those are the only two scenarios where Cloudflare is going to inject a challenge as frequently as you assert.
Normally Cloudflare will only challenge after unusual behavior has been detected, such as inhuman numbers of page requests within a short duration, or the URL/forms are being manipulated, etc. The default settings are fairly unobtrusive in my experience.
If you are also complaining about generic captchas on forms and what-not, that's a different thing entirely. Those exists as anti-bot measures, naturally, but also as anti-human measures. We simply do not want a pissed customer to send us 900 contact-us form requests one drunken evening...
> Cloudflare does not do this - I've made that point several times. The website operator either has the security setting cranked to a paranoid level
This is a bit of intent laundering. By Cloudflare providing ridiculous options, some people are going to take it because more "security" must be better.
> Normally Cloudflare will only challenge after unusual behavior has been detected, such as ...
or people using more secure browsers like Firefox with resistFingerprinting = 1. I suspect this is a significant blind spot for site operators. Have you personally tried your own site with RFP=1, TOR browser bundle, VPN from a datacenter IP, etc?
> generic captchas on forms ... exists as anti-bot measures, naturally, but also as anti-human measures. We simply do not want a pissed customer to send us 900 contact-us form requests one drunken evening
My whole point is it's a bit disingenuous to throw out large quantities of things as the argument, when the hassles are often thrown up on the very first request. I'm not complaining about the sites that throw up CAPTCHAs after the third failed login, but rather the ones that do it on the first attempt!
And sure, I don't have a good map of which types of hassles are specifically Cloudflare versus others of their ilk. And I certainly don't know how often Cloudflare doesn't cause problems, as it doesn't stand out. I just know there is too much indefensible surveillance-based user-hassling in general and OP's anecdote is right in line with my standard browsing experience on many sites these days.
> or people using more secure browsers like Firefox with resistFingerprinting = 1. I suspect this is a significant blind spot for site operators. Have you personally tried your own site with RFP=1
Yes, and it is not an issue for us. Again, this is up to site operators to decide for themselves. The defaults are sane, and Cloudflare makes it very clear what each level of their security configuration does. It is up to the site operator to decide how they want their site to behave. Perhaps, simply avoid sites that bother you? That list will grow by the day, unfortunately.
> TOR browser bundle, VPN from a datacenter IP, etc
Nobody, and I mean nobody, cares about this traffic. We're in the ecommerce space, so perhaps by that I mean nobody in the ecommerce space cares. We do not want TOR traffic. We do not want random-cloud-ip-vpn traffic. These are more often than not where our fraud bots/attempts originate, and we are not alone.
Recognize, if you are using TOR, or browsing regularly via a datacenter IP VPN - you are in an extreme minority and unfortunately lots of folks before you have used these services for bad things.
I personally like TOR, and VPNs. This is no slight against them - but the facts are undeniable here.
> surveillance-based user-hassling
You also referenced canvas-based fingerprinting, and seem to assume that's how these things work. Some might, but many are much more dumb than that. Usage-pattern based challenges are fairly simple when you understand what normal traffic looks like.
OK, fair enough. Not in about six months to a year. Because publicly available ML can now solve pretty much any CAPTCHA a human can solve, there's now an incentive to start deploying and improving the already existing JavaScript-capable bot frameworks.
Most bots are either search engines (of all kinds, not just your google's and bing's), competitors, or academic/fun projects.
Of the three common types, only one has a serious interest in breaking captcha's - but they also have a serious interest in not getting too much attention by abusing your services deliberately. ie. if a bot is misbehaving, it's going to get our attention, and we're going to look into what it's doing, where it came from, who operates it, etc, and possibly take some action if appropriate or available. Those actions may not necessarily be limited to the technical space either...
This is just our experience. Other industries will have their own sets of challenges to deal with.
CloudFlare is usually there to mitigate bots attacking. Without which, the site wouldn't be available to view in the first place.
CloudFlare is merely the symptom of a greater set of problems, which it attempts to mitigate.
If you want to be angry about something, be angry that bruteforce attacks are common, guzzle resources and usually yield zero legal repercussions in most cases.
Personally, I have no problem with CloudFlare's bot protection. My problem is with CloudFlare's lack of diagnostics and community involvement to resolve/explain false positives. I have no idea what obscure default setting to change in Firefox to make it work.
Perhaps. Ask your government about it if you genuinely don't think the alternative is going to be far worse. People demand x solutions for y technologies like crypto or AI. Demand solutions for the problem.
The centralized solution is going to be a government-owned/controlled MITM service like CloudFlare. No doubt with actual ID for verification.
I don't see the decentralized solution happening any time soon.
Massive attacks existed long before CloudFlare ever did. If you're implying there's a conspiracy that CloudFlare is attacking others directly or indirectly to sell their solutions, I'd be extremely careful as that's defamatory and almost certainly false.
Furthermore, most CloudFlare users only use the free plan and thus cost CloudFlare money. Isn't that curious?
Speaking of bullshit restrictions designed to encourage compliance with surveillance, have imgur links just straight up stopped working for anyone else recently? I'm coming from a datacenter IP. I assume it's just some heavy handed part of the cost cutting push they announced.