Why not adhere to the HTTP standard for once and return a 429 Too many requests or 403 Forbidden? 404 Not Found should only be used for “Not Found” errors.
If adhered to this posts advice, you can really screw your search ranking if a major web crawler gets blocked and de-lists all of your pages because of 404
I think you can still adhere to the HTTP standards (Rails have built-in keywords for those aswell).
This article just talks about blocking hosts that make too many requests to a 404 page (I guess because it's crawling?) and how to mitigate those using Rack Attack.
Could you please explain how sending a 404 to clients sending too many requests would adhere to the HTTP standards? I’m not against the blocking itself.
When I say you can still adhere to the HTTP standards, I mean that you just have to adjust the suggested settings in the config, so requests reaching over the throttle should give 429 instead of 404:
Like the following:
# config/routes.rb
get "429", to: "welcome#429", code: 429
rescue_from ActiveRecord::RecordNotFound, with: :too_many_requests
def too_many_requests
redirect_to "/warning"
end
# config/initializers/rack-attack.rb
class Rack::Attack
...
blocklist("block 429") do |request|
Allow2Ban.filter("too_many_requests-#{request.ip}", maxretry: 5, findtime: 3.minutes, bantime: 1.day) do
request.path == '/429'
end
...
end
# config/initializers/rack-attack.rb
Rack::Attack.blocklisted_responder = lambda do |request|
[ 429, {}, ["You are blocked. If you think are not a bot and you think it was due to a mistake, reach out to us at support@yourdomain.com"]]
end
The way I interpreted the article wasn't to redirect everything to a 404 page, but how to handle massive amount of requests to a 404 page!
Almost all the Rails apps I worked on in the past 18 years don't care about SEO because they don't have much that googlebot can browse without an account. Everything important is either behind a login form or served through an API. The API is probably used by a JS frontend, which googlebot browses, and it's up to the frontend to deal with 404 and other errors in a gracious way. Finally, if the API is machine to machine, there are no worries about SEO.
This is the better way. The redirection happens inside the server. Rack Attack runs early in the request processing and says "do the rest of the processing as if the client's request were for a page that does not exist", then the server acts accordingly.
You haven't been redirected. The redirection happens inside the server, it's never communicated to the client.
Well, almost inside the server. The server first does TLS processing, then this part, then that which the backend developers see as "the server". From a backend developer's perspective, this code runs in the zone between browser and server, where load balancers and such live. Operationally the code runs on in the same rack, probably on the same CPU, as the backend code.
Turbo[0] has been solving this for years. Quite the contrary, front-end frameworks have started to think "sending JSON is good, but actually sending HTML could be great!".
DHH's presentation[1] during Rails World 2023 is quite interesting in that regard, I recommend you give it a go (start around minute 16). I am actually very excited with his vision of the web.