Thousands of requests per hour? So, something like 1-3 per second?
If this is actually impacting perceived QoS then I think a gitea bug report would be justified. Clearly there's been some kind of a performance regression.
Just looking at the logs seems to be an infohazard for many people. I don't see why you'd want to inspect the septic tanks of the internet unless absolutely necessary.
I love the snark here. I work at a hosting company and the only customers who have issues with crawlers are those who have stupidly slow webpages. It’s hard to have any sympathy for them.
We were only getting 60% of our from bots at my last place because we throttled a bunch of sketchy bots to around 50 simultaneous requests. Which was on the order of 100/s. Our customers were paying for SEO so the bot traffic was a substantial cost of doing business. But as someone tasked with decreasing cluster size I was forever jealous of the large amount of cluster thatwasn’t being seen by humans.
One of the most common issues we helped customers solve when I worked in web hosting was low disk alerts, usually because the log rotation had failed. Often the content of those logs was exactly this sort of nonsense and had spiked recently due to a scraper. The sheer size of the logs can absolutely be a problem on a smaller server, which is more and more common now that the inexpensive server is often a VM or a container.
If this is actually impacting perceived QoS then I think a gitea bug report would be justified. Clearly there's been some kind of a performance regression.
Just looking at the logs seems to be an infohazard for many people. I don't see why you'd want to inspect the septic tanks of the internet unless absolutely necessary.