AI is killing my website but in a different way than what's discussed in the article: Content scrapers are completely out of control, spiking my serving costs and degrading performance for human users. There seem to be hundreds of them and they've gotten very good at looking like human users so they're hard to block or throttle. I can't prove they're all AI-related scrapers, but I've been running the site for 25 years and this issue only became problematic starting in, oh, late 2022 or so.
Even my barely visited personal website is using almost 10GB bandwidth per month according to Digital Ocean. My website is 90% text with no video, so I imagine it's just bots and scrapers hitting it all day. I'm very close to password protecting the whole thing aside from the homepage.
Same, I get next to no value from the personal website these days. Certainly not worth being exposed to the harsh realities of the web in 2025.
A sad state of affairs, but it was predicted decades ago that commercial interests would turn the internet into what it is today, even without AI. Layer on the dead internet theory slowly coming true, and walled gardens almost feel like the last bastion of free internet, rather than being what brought it to an end.
There's probably some nuance there, maybe the walled gardens allowed us to be comfortable letting it get this bad. Either way, what's gone is gone.
I am getting a lot of joy from local net and just making little devices at home, that is giving me the same excitement that the web did in the past!
I've never really thought of this as a strong benefit before but this seems like a good argument for app development in 2025.
Started my career as a web developer and always have a soft spot for it but from a hobby developer standpoint hosting and deploying a site feels like an uphill battle with little upside.
Theres a lot to not like about app development but after you get approval for your app it's pretty hands off.
Heck on android you can just distribute the raw APK on a Google drive. Or just keep the app locally for yourself.
I probably have about 20 self made apps on my phone and they each give me a bit of happiness.
> after you get approval for your app it's pretty hands off.
For about a year, after which point both Apple and Google will arbitrarily remove your app from stores for not keeping up with the platform update treadmill.
Nothing too innovative, I've made some chained, LAN connected lighting for around the house. Some button boxes for sim racing. I built a small CNC machine and a 3D printer, which have let me build various things for the car and the bike. Last week I made a very specialised splint/cover for an injury on my shin, shhh don't tell the healthcare industry about 3D printing.
It's a flat rate. That said it feels like wasted bandwidth as I know it's not humans visiting the site. There isn't anywhere near 10GB of content on my site. 200 MB max.
This is no joke. Run an e-commerce site and have been generally skeptical of WAFs but had to sign up for a bot mitigation service because it became out of control two years ago. These bots were even executing JavaScript and loading third party scripts which caused additional fees from these vendors. We went with Datadome and are pretty happy with it but I wish I didn’t need to!
Determine how much you need bots to access your website. If you don't need them then block them. This will kill you if you are reliant upon AD revenue.
On my personal site I started blocking bots and also set a "noindex, nofollow" rule to block web crawlers and search bots too. I noticed no change in traffic and still do about 10,000 visits a month.
No, but you can identify and block them by other means, such as redirecting bots to an empty page on your web server or blocking them from your router.
There's various tricks that can help narrow the criteria, but one that seems particularly good is to put paths no human user would ever find or traverse in robots.txt disallow. Ban anything that visits that path.
When you say "big hitters", I guess you mean the well-known corporate crawlers like GPTBot (one of OpenAI's). Yes, these do tend to identify themselves --and they tend to respect robots.txt, too -- but they're a small part of the problem, from my perspective. Because there's also a long tail of anonymous twerps training models using botnets of various kinds, and these folks do not identify themselves, and in fact they try to look like ordinary users. Collectively these bots use way more resources than the name-brand crawlers.
(My site is basically a search engine, which complicates matters because there's effectively an infinite space of URLs. Just one of these rogue bots can scrape millions of pages from tens of thousands of IPs; and I think there are hundreds of the bots at any given moment...)
I've never told anyone about my website (thats under construction and under a subdomain) and GCP is charging me 50 cents/month for egress from multiple continents. 4 years ago that would have been 10 cents/month.
Can't you use Captcha services? If the big tech captcha services are too costly, one could create a rudimentary captcha defense against these bots pretty easily. Am I missing something?
If you throw a captcha in actual users' faces they'll most likely just leave for a competitor that doesn't harass them with busywork, and you'll be left with only bots patiently trying to get past it.
Oh, I've tried captchas, and I can say that they're an awful experience for the humans you accidentally hit, at least for a service like mine that is relatively low value per session (a dictionary website). Within minutes of changing my WAF configuration to captcha users that it thinks are high-probability bots, I'll get angry feedback from users in Singapore (say) who don't want to have to solve a puzzle in order to look up a word in the dictionary. I don't blame them.
I like the Cloudflare challenge ideas suggested on this thread, though, I might try them again.
I'm on AWS and use their WAF service to do some rudimentary bot blocking, but CDNs (their Cloudflare equivalent, Cloudfront) have been too expensive in the past and the bot control mechanisms they offer have too many false positives. Perhaps I should check it out again.
Part of the problem is the economics of it -- I've chosen to self-fund a high traffic site without ads, and that's on me. But it was possible to do this just a few years ago.
>and the bot control mechanisms they offer have too many false positives. Perhaps I should check it out again.
Cloudflare no longer does CAPTCHAs so even if users get flagged as bots, the user experience isn't terrible. You just have to click on a box and you're on your way. It adds maybe 3s of delay, far better than anti-bot solutions that require you to solve an captcha, or imperva's (?) challenge that requires you to hold a button for 5-10s seconds.
If you're given a button to click, your browser has successfully passed the environment integrity checks and you have not been flagged as a bot.
You'll be flagged as a bot if your browser configuration has something "weird" (e.g. webrtc is disabled to reduce your attack surface) and you will be completely unable to access any site behind cloudflare with the anti-bot options turned on. You'll get an infinite redirect loop, not a button to click.
Note that Google's version of this was determined to be checking whether you had a 9-day-old tracking cookie.
The researcher who discovered this was able to generate 60,000 "I am not a bot" cookies per day, and use them up about 15 times each in a bot before it started getting captchas.
That's probably what it was. So they accessed some page over and over, pretending to not have the cookie yet, got a bunch of cookies, and 9 days later, used them to bypass captchas.
I'm not sure why this is down voted. The forum I run was hitting monthly bandwidth limits within days because of bots. Changes to htaccess and checking environment data in the PHP code was a cat and mouse game. The free Cloudflare tier solved my problem.