Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: “HTTP 419 Never Gonna Give You Up” for bots (bradgessler.com)
262 points by bradgessler on Oct 28, 2021 | hide | past | favorite | 71 comments



I definitely agree bots are underserved, I have a few things I do to keep them entertained, ssh bots are tar-pitted to keep them connected but busy, my hope is that I occupy at least one thread of not the whole process.

For wp-login bots I serve them a nice chunk of random (generated by a fuzzer) html in the hopes that 1. It wastes abit of their bandwidth/memory and 2. it crashes their parser

In reality I guess bots nowadays are sturdy enough to not get stuck or crash but who knows, feels good to do something :-)

Tarpit instructions https://nyman.re/super-simple-ssh-tarpit/

Wp-login page https://twitter.com/gnyman/status/1181652421841436672?s=20

And I remembered another nice trick which someone else came up with, zip bomb the bots :-)

https://blog.haschek.at/2017/how-to-defend-your-website-with...


Although I think bots should be free to access the same content as humans do, I have a suggestion for your fuzzer anti-bot-spray:

It won't work on the more sturdy samples, but maybe try a GZIP bomb on https streams: https://www.infosecmatter.com/metasploit-module-library/?mm=...


Could there be legal repercussions for doing this?


It's your server and someone is accessing it. It's up to you what you serve them.

If you want to be clear, you can put the gzip bomb behind a link that says "do not click, gzip bomb". The bot won't know the difference.


Pre "guy views html source gets home raided for haxx0ring" I'd have said "you silly!"

Now... I'd say "there shouldn't be, it's your server, people can chose to access it or not, but if the right kind of fool comes along, there's no knowing where the stupid ends."


There are some very cool ways of doing these tarpit. This for example: https://nullprogram.com/blog/2019/03/22/


I blocked almost all wp-login bots just using bot fight mode in Cloudflare few months ago along with some CF page rules to run an interstatial. It seems to losing effectiveness over time though, and since I do have WP-login, I wonder how I can implement something like your idea.

Maybe rename the legit login and put this in its place, but that would cause issues for redirects from the legit login link...


Change your login path to something like /custom-admin. Then create a page rule to captcha any attempt to access /wp-login. What traffic other than bots is going to go to the old login page? You can change the login link to go to the new page.


or better yet /custom-admin-07a4b58e-3880-11ec-904e-ba0baece2ff4


There are some popular WP plugins that takes care of changing the wp-login path


Every time I read about ssh tarpits I wish I had a reason to set up one in my VPS. Alas it's much easier to use the VPS provider's network access rules to block all incoming traffic to tcp/22 that isn't from my IP.


> "And I remembered another nice trick which someone else came up with, zip bomb the bots :-)"

Just curious, is it legal to host a zip bomb on your website? I would think it would be classified under some kind of Cyber crime....


Legality aside, your web hosting provider may consider it as malicious software / cyberattack activity that breaks their TOS.


Why would that be? It's not even executable code: someone would need to 1.actively request it, 2.actively save it somewhere 3.actively try to extract it.


If the zip bomb explicitly targets bots it becomes not only a zip bomb, but a mitigation tailor-made to prevent abuse of your platform. Phrase it as the latter and it is probably okay.


It’s a bad idea.


> I’m half joking, but if we can have HTTP 418 I’m a Teapot then there is enough room in the HTTP standard for the more useful HTTP 419 Never Gonna Give You Up error code.

Actually, there was a proposal to remove the 418 code formally, but in the end it was grandfathered in. Unfortunately, unless you have convinced a lot of people to allow 419, it would be not allowed anymore (even in a April Fools' RFC) according to the established protocol of IANA controlling the allocation of error codes, and IANA no longer allow "joke" allocations unless there was an RFC clarifying why that particular code must exists in a non-joking manner (see 451, in homage to Fahrenheit 451 but is the recommended code for a informed block). Even 418 was technically only reserved in such a way that allows it to be overridden in case that a good demonstration that 418 should be the code for that error.


  HTTP/1.1 527 Railgun Error
  Server: Ballistic Research Laboratory - CHECMATE
  Date: Fri Oct 29 02:08:03 2021
  Connection: Keep-Alive-overridden
  Authorization: Rules-Of-Engagement-090624-2021-10-29
  Content-Type: uranium/depleted
  Content-Weight: 248kT equivalent


And? So is this the C********* equivalent of C***** forcing web standards without even consulting others?


No. This is the HN execution of a lame joke on my part...


For what it's worth, I loved the joke.


The thing that really disappoints me is that 418 I'm a Teapot isn’t registered—instead it’s reserved as “(Unused)”: https://www.iana.org/assignments/http-status-codes/http-stat..., https://www.ietf.org/archive/id/draft-ietf-httpbis-semantics.... As it stands, I suspect (as one that’s been involved in a couple and examined more back in 2013–2014) that most even vaguely recent HTTP libraries that have some kind of status code enum or constants defined take their data from the HTTP status codes registry, with a single exception for 418 I'm a Teapot.

As far as a 419 is concerned, I’d argue that 418 is already suitable anyway as a joke alternative to the more serious 429/503: “wp-admin.php? I’m not WordPress, I’m a teapot!” (Similar style to the joke about one cow warning another about the mad cow disease in the area, and the other responding that it’s not not worried because it’s a helicopter.)


418 is defined in HTCPCP, an _extension_ to HTTP. That's why I never understood why people use it in HTTP. So it makes sense that it's only reserved in HTTP.


WebDAV is also an extension to HTTP. Its additional status codes and methods are added to the corresponding registries.

The entire raison d’être of such registries is to include any extensions; if you only cared about the core stuff, you wouldn’t define a registry.


Another good reason to have a 419 response code:

https://www.419eater.com/

https://en.wikipedia.org/wiki/419eater.com


One day I will make an IoT teapot. It will have an HTTP API that responds with a 418 and legitimize the code once and for all.


IANA controls http codes only insofar as no one has told them to knock it off yet. There's no major interop risk from conflicting (200, 400, 500) codes in the way there is for other namespaces because the semantics are essentially contained only in the first digit.


I use 418 on my gopher server [1] to inform misinformed webbots that they're not talking to an actual webserver. It works remarkably well.

[1] gopher.conman.org


Added to my gopher bookmarks[1] - is floodgap aware of you? I've not see you on the lists of live servers.

---

[1] Shameless plug: http://jaruzel.com/gopher/gopher-client-browser-for-windows


Yes they are. I'm on the list of sites under ".com, .net, .info, .land and .org" (#22).


I realised thy would be after I posted my comment - I recognised your username from the Gopher mailing list. :)


What are some good gopher resources?


If you're aware that someone is doing penetration tests on your system, but their probing isn't significantly costing you resources, wouldn't you instead just give some generic response to not clue them into you knowing their intention? There's a lot of people who basically do that with scam callers by just leading them on and wasting the scammers time.


I used to do something along this line. If I saw a bot then I would use ACL's in haproxy to serve up some static pages from memory that contained strings their request was looking for. This of course attracted more bots. It didn't cost me anything aside from making my logs a bit more noisy, so I disabled logging for the bots. Then I found a funny side effect of shodan showing my nodes being vulnerable to many things. That was a blemish so I disabled the ACL's. In hind-sight and knowing how bot farms work it wasn't really wasting anyone's time or resources but was a fun little learning exercise.


I wonder if zip bomb like responses will still work for the majority of bots

https://blog.haschek.at/post/f2fda


Maybe sometimes, but you would just be the reason some random person said "Dammit my machine blue screened again." or "Why is my machine using so much ram?" The C2 machines would detect this node offline and use a different one. On the plus side, maybe a percentage of those people would re-image their machines and patch them.


Send them redirects to a russian governemental site. They'll take care of it


This could be seen as abuse by the .ru and .su folks


Those folks have been actively abusing international laws, sponsoring cybercrime, and responding with “so what?”

That’s what. Deal with it. Build your enclosed cheburashka internets or whatever. I couldn’t care less about hurting their feelings.


Redirect to a honeypot as a service that utterly wastes someone’s time.


You could but it's extra work to build that into the application while you could use a generic off the shelf WAF / IDS type solution that just blocks them. Won't fully stop a targeted manual attack but it is enough to make bots move on to their next target. And it slows down any manual reconnaissance work.


Blocking someone is still more generic than returning a specific HTTP response code specifically designed to inform the other party of your suspicion.


I like the spirit of the idea, but messing with bots and script kiddies is best kept a highly local thing.

You don't need a standardized error code to signal to a red team, you can say "hi" in a number of different ways, depending on what they're poking at. And if everyone is doing the same thing to script kiddies, well, where's the sport in that?


https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#Unof...

419 Page Expired (Laravel Framework) Used by the Laravel Framework when a CSRF Token is missing or expired.


419 error is a Rick roll? Ridiculous. It obviously has to be a once in a lifetime opportunity from a Nigerian prince.


HTTP 420 - Enhance Your Calm, could also be useful here if you are going to be explicitly rate-limiting the client.


Method Failure in Spring.

"Shut The Fuck Up" in my framework.


If it redirects then it should be in the 3xx class


I was hesitant on the redirect. It would probably be easier to demand the spec displays "Never Gonna Give You Up" in the appropriate requested format.


400's are errors caused by the client, I think that fits better.


Probably shouldn't be made an official thing, but it'd be funny to do this on all the various minor manually-adminned sites out there.


Just return random status codes that still render html on every page to troll bots.

https://www.youtube.com/watch?v=I3pNLB3Cq24


a part of me is definitely in favor of this, but another part of me wants to avoid turning http error codes into a meme


No, this is turning a meme into an HTTP error code. You're thinking of:

https://i.redd.it/wmwqgt9kbop41.jpg



You almost got me. ;-) I remembered the timeless advice, though: "XcQ - link stays blue" (or in this case, "black").


Why not redirect the bot to fbi.gov and let them scan that?


If the requirement is that client should follow the redirect, one should not use a 4xx status code. I think “319 never gonna give you up” is more adequate



Superficially a fun idea..

Side efffects may include:

* Helping bot authors improve their bot so it won't be identified.

* Revealing how good you are at detecting bots.


I prefer to just return a 404 if I know for sure that it is a bot to try to cheat them


I'll vote for that (but no one asked me). I usually use 418 for similar stuff.


I mean, technically, wouldn't this make bot scanning more efficient?


NGINX has a very nice status 444 that silently closes the connection, I think serves as a great way to deal with uninvited connections.


Limit request rate and be done with it other than reviewing 429s, 404s, 401s etc?


hell, let's go up to 10000 response codes and sell them to the highest bidding meme of the year.


Scammer prank calls, but for bots


I would like to target the Brave search crawler/bot, but they’re hiding themselves like every other spambot: https://twitter.com/tinusg/status/1453862793933897729?s=21


This says that the index is created from users' own web browsing, not from a bot.


Wait brave indexes what you browse? How has there not been some bug yet? Can't imagine that going well...


It all seems quite scammy. They claim it's opt-in, but as a Brave user I can't find the option to opt-in or opt-out.

They look the searches you do at Google and other search engines. The search terms and the results you click on gets sent their way, including metadata from the page.

The content itself is downloaded by their bot (more like a fetcher). That bot has the user agent of a regular browser, so you won't see it.

You also can't specifically block their fetcher. It only adheres to disallow *.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: