Breaking the 4Chan CAPTCHA

cherryteastain · 2024-11-29T22:02:00 1732917720

The part about bad Keras<->Tensorflow.js interop is classic Tensorflow. Using TF always felt like using a bunch of vaguely related tools put under the same umbrella rather than an integrated, streamlined product.

Actually, I'll extend that to saying every open source Google library/tool feels like that.

alecco · 2024-11-29T23:23:20 1732922600

related (15 days ago)

https://news.ycombinator.com/item?id=42130881 on Francois Chollet is leaving Google

> "Why did you decide to merge Keras into TensorFlow in 2019": I didn't! The decision was made in 2018 by the TF leads -- I was a L5 IC at the time and that was an L8 decision.

Retr0id · 2024-11-29T22:50:50 1732920650

something something Conway's law

Dachande663 · 2024-11-30T08:00:16 1732953616

Semi-related but I needed a CAPTCHA on my site[0] mainly to block comment form spam and settled on repurposing a fun method I’d seen before. Is definitely not foolproof (or hard at all), but I really liked making it.

[0] https://www.hybridlogic.co.uk/contact

vunderba · 2024-11-30T17:35:05 1732988105

Reminds me of the Doom captcha.

https://vivirenremoto.github.io/doomcaptcha/

Dachande663 · 2024-11-30T17:58:33 1732989513

99% certain this is where I copied the idea from.

winrid · 2024-11-30T08:03:02 1732953782

It says I've been blocked when I try to view that. Not on a VPN.

Dachande663 · 2024-11-30T18:00:40 1732989640

The site runs off of a tiny little server at home so I’ve got some very aggressive firewall rules. Anything from the usual bad countries, certain signatures etc are blocked. Reduced traffic to 1% of previous load.

winrid · 2024-12-02T05:33:27 1733117607

I'm in silicon valley in the USA on Comcast lol

account42 · 2024-12-02T12:23:27 1733142207

That fits the usual bad countries filter alright.

efilife · 2024-11-30T20:46:00 1732999560

What are the bad countries? Russia and china?

aguaviva · 2024-11-30T23:12:48 1733008368

There are no bad countries, only bad streams of traffic.

Some of which have been observed to be associated with certain countries.

imchillyb · 2024-12-01T00:20:53 1733012453

I believe even a cursory examination of recent history to show your premise to be less than truthful.

There are bad actors. There are bad groups of actors. There are bad political regimes of groups of bad actors. There are countries made up of bad political regimes made up of groups of bad actors.

EasyMark · 2024-11-30T16:54:12 1732985652

Are you in a safari browser?

winrid · 2024-11-30T17:30:17 1732987817

Chrome android

chamomeal · 2024-11-30T15:42:18 1732981338

No way, that is a cool fucking captcha!!

tayiorrobinson · 2024-12-02T13:52:03 1733147523

Cool, sure, good, probably not. I've never played Halo so I didn't entirely know what I was doing (do I shoot the blue guys too? it's not letting me through so I guess I do), and I don't doubt people couldn't even get what it meant by shoot. And god forbid anyone with disabilities that affects their mouse accuracy, or needs a screen reader tries to use it

Haven't looked at the devconsole but it'd probably be easily bypassed by someone dedicated.

account42 · 2024-12-02T12:22:34 1733142154

Cool as a one-off use on some random blog contact form. Infuriatingly annoying if used somewhere you have to solve it with any frequency.

bawolff · 2024-11-29T23:54:22 1732924462

There is a reason why people moved away from distorted text based captcha. We are basically at the point where computers are better at them then humans.

https://www.usenix.org/system/files/conference/woot14/woot14... is a paper on the subject i think is really interesting

However a surprising amount of text based captchas can be solved in a few line shell script of, using imagemagik to convert to greyscale, dilate and undilate, then pass to teserract

However there are also sites like https://2captcha.net , so really captchas are more like putting a small min amount of effort.

noprocrasted · 2024-11-30T00:16:04 1732925764

Just because you can technically crack them doesn't mean they're useless.

There's a significant amount of time, skill and effort that went into the solution from this post, and the end result doesn't generalize well (you'd have to start all over for a different kind of captcha).

The vast majority of spammers would not be able to replicate this; those who do would either make money legitimately, or focus their skills on juicier targets (if you have AI/ML skills and want to do nefarious things there are other options that pay much better than spamming).

Such captchas still work well at raising the cost of successful spamming above the expected payoff from said spam.

reaperman · 2024-11-30T03:36:48 1732937808

So, I do this type of AI development for solving CAPTCHAs.

I can't get any real jobs that pay me for my more advanced skills. My primary sins were going to a second/third-tier university and some performance concerns in a portion of my previous roles due to divorce and burn-out. I make $80k/year in government IT, and $30-150k/year as the "AI" guy in a small 2-5 person group that offers a CAPTCHA-breaking API.

The spammers aren't the ones replicating this. They just pay B2B rates (combo of SaaS + Consulting, depending on client needs) to help them remove the roadblocks.

jostinian · 2024-11-30T16:37:17 1732984637

I am a nafri with a PhD and engineering experience (with europeans), I can't make good living going the traditional way either with with remote jobs being impossible and no luck landing a visa.. I have built custom solutions for big name EU companies to keep an eye on the competition through scraping. captcha solving cloudflare bypass is a great part of that. Getting back at companies making the UX bad with captcha does feel good also.

benreesman · 2024-11-30T08:44:13 1732956253

If there were a totally 100% aboveboard way to do this in a net transfer of utility from Tessier-Ashopool SA to the typical web surfer I would be a superfan.

HeckFeck · 2024-11-30T12:46:45 1732970805

Why do you do this?

While I can appreciate the technical achievement, you know most users of forums and imageboards don’t want any AI content at all.

KomoD · 2024-11-30T14:11:22 1732975882

> Why do you do this?

Money, obviously. I'd also do it for $30-150k/year

> you know most users of forums and imageboards don’t want any AI content at all.

He's not creating or posting any "AI content"?

HeckFeck · 2024-11-30T14:57:59 1732978679

Okay, but you know his actions are enabling more AI content and spam to proliferate? I hardly think he is making that much money just because legitimate users don't want to fill in a captcha.

brookst · 2024-11-30T15:34:54 1732980894

It’s very easy to opine about the ethics others should have. Different when it’s you and your family and a comparatively easy effort will make a material difference in quality of life. And especially when you know ghe market need will be met by someone else anyway.

HeckFeck · 2024-11-30T16:23:29 1732983809

So you get a bit richer for less effort... but how do you think moderating legions of spam posts affects the lives of independent website owners, who just want to create communities around the things they love?

Or indeed the users, who have to wade through trash invading their threads?

Or other legitimate users, who now have to answer captchas from CloudFlare just to access their favourite websites?

Ultimately this is a parasitical element, choking the internet. It will kill the things it profits from. Many will give up running these sites, you walk away with your $100k, and no one can ever do it again... you've not created anything of value, but destroyed it.

jrflowers · 2024-11-30T19:26:53 1732994813

> So you get a bit richer for less effort

If $150k/year is “a bit richer” for you you could simply offer to pay that poster that much in exchange for stopping.

idiotsecant · 2024-11-30T16:44:32 1732985072

Yes, people often do unsavory things for money. Is the point you're making that they shouldn't do bad things. Like, what are we even talking about here?

The lesson here is that systems that rely on humans to do the 'moral' thing and fail otherwise are bad systems.

kristopolous · 2024-11-30T21:42:59 1733002979

I had no idea I'd see so many people rising in defense of bots and scammers.

b___________ · 2024-12-01T11:03:11 1733050991

A lot of people on this website are directly responsible for making botting and scamming easier to pull off. It's kind of necessary for them to find justifications for them so they can sleep soundly.

kristopolous · 2024-12-01T14:41:04 1733064064

Hn is like that sometimes. You read a thread and think "well I guess I'm alone on what I think is a perfectly reasonable position"

The other one that comes to mind is anacap, which I view as fringe. I read the comment threads sometimes and it leaves me with the impression that I'm the only one that thinks Von Mises and Rothbard are a bit out there.

TZubiri · 2024-11-30T21:56:12 1733003772

This sounds so out of touch, you are comparing the livelihood of a person with the quality of a meme imageboard.

gopher_space · 2024-11-30T21:10:02 1733001002

Let us know who you work for and I'm sure we could find a few stones to throw your way.

Jerrrry · 2024-11-30T16:42:55 1732984975

Hard to think anyone who can't solve a captcha is something other than a parasite.

Parasites can solve captchas, people with accessibility issues and the poorest of people are the one being locked out.

marcosdumay · 2024-11-30T19:29:30 1732994970

Captchas are not only for stopping people with disabilities anymore. They also stop people using non-approved browsers, people trying to stay anonymous, people coming from the wrong geographic areas...

TZubiri · 2024-11-30T21:58:05 1733003885

If the AI has access to a credit card, but Mgulu from Nigeria doesn't, then the system doing the filtering might evolve to filter out the 'undesirable' rather than the non human.

lostlogin · 2024-11-30T18:40:50 1732992050

It isn’t just those with accessibility problems that get stuck. Some are stupidly difficult- I’ve given up on one in recent times.

Maybe my disability is CAPTCHA blindness.

gosub100 · 2024-11-30T17:44:12 1732988652

HN moderates spam without the use of AI-crackable "prove you're a human" bs

IWeldMelons · 2024-11-30T16:20:10 1732983610

If someone is making a brazen statement of being "a bad guy because 80K is not enough, and could not find anything decent for those extra $30K" what kind of treatment would they expect?

marcosdumay · 2024-11-30T19:35:25 1732995325

To be fair, there's a huge amount of people around here that work on the universal surveillance industry, and for many of them the alternative is way higher than 80k.

xp84 · 2024-11-30T18:23:08 1732990988

I’d argue that someone cracking CAPTCHAs has a lot less dirty hands than someone who works in an actually scummy industry like US health insurance. Those companies literally kill people by denying them care to pinch pennies. This guy might cause a little more spam on the already useless mess that is YouTube comments. Who cares. I’d take the money, too.

IWeldMelons · 2024-11-30T18:31:02 1732991462

Typical US-centric way of thinking. If your healthcare system is crippled it does not justify making internet mess for everyone else in the world.

lostlogin · 2024-11-30T18:44:19 1732992259

I’d have suggested adtech before health insurers.

rad_gruchalski · 2024-11-30T20:48:19 1732999699

So says every drug dealer.

reaperman · 2024-12-01T02:09:36 1733018976

We actually don’t take spam/fraud clients. Granted - there’s a bit of a delicate song+dance with each client, people who are doing spam/fraud usually know better than to admit it outright.

Our group focuses on scraping for lightly-funded LLM startups who cant pay Reddit/X/etc API fees, and data aggregation for cottage industries (price comparison, etc)

But a LOT of the people in our niche do fraud and spam, so we’re steeped in that culture whether we embrace it or reject it.

gosub100 · 2024-11-30T17:42:00 1732988520

Everyone's got their price. I would certainly do dirty deeds myself for the right amount.

vkou · 2024-12-01T06:50:05 1733035805

Half this forum works on filling the Internet with crap, and most of the other half work for industries that on the net are making the world a worse place, it's kind of table stakes for getting paid.

Capitalism optimises for value to the customer, not for overall public good.

BolexNOLA · 2024-11-30T14:13:56 1732976036

People who do jobs like that don’t really care about the impact

noprocrasted · 2024-11-30T22:50:46 1733007046

The percentage of captchas used to deter spam is probably a minority these days. A lot of captchas nowadays are used to prevent adversarial interoperability or the free flow of information.

If you want to spam, you don't actually need to break many captchas. Just make your spam/scam/misinformation "engaging" enough and the social media platforms will host and promote your spam _for free_ and won't even ask a captcha.

reaperman · 2024-12-01T17:36:53 1733074613

This precisely. Our clients are generally looking at adversarial interoperability.

But the rhetoric around the assumption that I was doing it for spammers was also interesting, as well as being more closely related to TFA!

Der_Einzige · 2024-11-30T17:12:45 1732986765

Being able to spam and mind control 4chan would be amazing just for personal reasons, let alone how juicy that is to governments around the world!

lukas099 · 2024-11-30T17:28:09 1732987689

It's working for the Russians.

ryandrake · 2024-11-30T14:22:15 1732976535

Despite the spamming angle, I think CAPTCHA-breaking is, on the balance, noble and honorable work. These things are user-hostile blights on the web, and any effort towards making them disappear as useless is worthwhile. Sites worried about spam should invest more in automated spam classification/elimination instead of punishing real users with CAPTCHA-solving. Not that I can offer a solution--if I could, I'd be a millionaire.

persnickety · 2024-11-30T19:38:33 1732995513

Who do you think spam classification false positives are going to be pubishing if not real users? At least with a captcha, you have some idea that you were rejected before you put in the effort to write your comment.

blackjackfoe · 2024-11-30T04:01:13 1732939273

Is your company hiring? :)

ape4 · 2024-11-30T17:03:29 1732986209

$30-150k/year is a big range

TZubiri · 2024-11-30T21:49:35 1733003375

Ahh the good ole dilemma of selling your soul, you study what you love only to destroy it for profit. Like an entomologist hired by a pesticide company.

I get it man, gotta make the bucks helping spammers advertise their shitty products, even if they destroy the internet.

noprocrasted · 2024-11-30T22:45:39 1733006739

What about the spammers that already destroyed the internet by steering it entirely towards advertising & surveillance capitalism? It's like the pot calling the kettle black.

We're all complicit in the enshittification of the internet and technology in general, just that we delude ourselves into believing we're on the "good" side because we call it "advertising" or "marketing" or "analytics" instead of spam, more spam and spyware.

The end result is exactly the same however.

TZubiri · 2024-12-01T18:20:35 1733077235

If you are referring to google. It's what financially powers most of the free as in beer internet, so hard disagree.

There's a difference between parasites that spam and make 1000 free gmail accounts. And Google.

fragmede · 2024-11-30T02:30:38 1732933838

> there are other options that pay much better than spamming

Are there? Say you've got a felony record and can't get a legit AI/ML job at eg OpenAI/anywhere. What would you do instead? most of the options I can think of involve getting paid for doing things that are basically spam if you zoom out enough.

benreesman · 2024-11-30T08:49:47 1732956587

I’ve got no criminal charges of any kind and I’d still want to know about any way to work without getting flagged as a known enemy of the Cartel.

I’m lucky that some people still want chops no matter the thought crime, I’m very grateful such excellent employers exist (love you guys).

But you’re never sure you’ll line up two such in a row, this isn’t the IBM until company casket and company funeral days. Makes life “interesting” even for a risk-taker.

andrewflnr · 2024-11-30T03:24:02 1732937042

How many people are there like that, and how much damage are they collectively likely to do? If you're a random spammer, how hard will it be to hire that person? Again, not aiming for impossibility, just reducing the damage.

TravisPeacock · 2024-11-30T04:20:35 1732940435

I've been working for myself for over a decade doing random projects for clients while also doing my own thing. My resume looks awful and the job market is trash. If be willing to take a job as a jr developer and work my way up (or a sys admin).

I used to run one of the world's largest ebook piracy websites but want to put that life behind me. Recently work came across my desk to create tens of thousands of accounts on a well respected website so they could more easily scrape it.

I just want a traditional job, but I also want to support my family and $4000 for a months work

supriyo-biswas · 2024-11-30T08:41:09 1732956069

If I were you, I'd probably try looking at companies working in the web scraping and reverse engineering fields, who might even appreciate the skills even if they were acquired in a, let's just say, "different" way.

noprocrasted · 2024-11-30T23:02:05 1733007725

There's nothing immoral about scraping a website. Best of luck to you!

mmsc · 2024-12-01T00:13:25 1733012005

The secret is knowing that companies actually want people like you, with the real world blackhat experience, because you know how the game works in practise not just theory.

noprocrasted · 2024-11-30T03:22:25 1732936945

There's plenty of mischief potential with "deepfakes".

TZubiri · 2024-11-30T21:55:13 1733003713

Interesting, subtle difference but I always thought of captchas as having computational difficulty, but that's clearly not the point as you say. The cost is not compute but developer time.

If you manage crack it at 1mhz per captcha or 1ghz or 1000ghz, it makes no difference, as the bottleneck is the network identifier (ip address/block)

While still a type of PoW, these economics are different than offline mechanisms like password hashing or crypto. Where a 1ghz cost is still significantly different than 1mhz.

hamilyon2 · 2024-11-30T13:23:01 1732972981

Captchas are now useful to distinguish well-intentioned bots (they stop whenever they see captcha) from malicious ones, which solve them, but still behave a lot like bots.

Well-intentional bots are first-class citizens

brookst · 2024-11-30T15:35:36 1732980936

Wouldn’t a well-intentioned bot follow robots.txt anyway?

hamilyon2 · 2024-12-01T15:48:13 1733068093

Curl and libcurl doesn't. Wget doesn't, I think.

account42 · 2024-12-02T12:29:43 1733142583

CURL isn't a bot, it's a tool. It can be part of a bot (which may or may not respect robots.txt) but it can also just act as a user agent directly for a human operator in which case it SHOULD just do what asked. Chrome doesn't follow robots.txt either for the same reason.

lostlogin · 2024-11-30T18:46:44 1732992404

Do you complete the circle and do the good bot bad bot classification with a mod bot?

atomicnumber3 · 2024-11-30T20:01:44 1732996904

The watershed of "good enough at programming to just get a real job" vs "can code enough to be really annoying to businesses, but not enough to hack it as a dev" is a lot more on the annoying side than you'd think.

I say this with the chagrin of someone who works on a cool software product that is also coincidentally really well-shaped to make people want to abuse it.

delfinom · 2024-11-30T10:33:21 1732962801

>he vast majority of spammers would not be able to replicate this;

Eh? They just need to buy their software from someone that can. I would say many of the malware and spamware isn't created by every individual deploying it, but instead vendors that got good at it and decide to make revenue by licensing out their software to other bad actors.

brian-armstrong · 2024-11-30T03:47:56 1732938476

Makes me wonder what comes next. Could we create a forum where every member must do a 15 minute video interview with a moderator? I know this "doesn't scale" but I think it could make for a funny gimmick.

matchamatcha · 2024-11-30T09:37:29 1732959449

When I was a teenager, I stumbled upon a music forum that required phone interviews for signing up. They had other interesting sign up rules, like you could not have silly user names (judged by the admin). I guess it served as an effective filter for their member base..

lobsterthief · 2024-12-01T00:14:12 1733012052

The silly username thing goes a bit too far though. It just means the admin will subjectively apply other rules. Doesn’t sound like a lot of fun.

jabroni_salad · 2024-11-30T04:33:04 1732941184

private torrent trackers are/were doing that. It was really just to make sure you understood how p2p culture works and what the expectations are, and really easy to pass if you just followed a guide. However, I did see many people fail their interview.

drexlspivey · 2024-11-30T18:52:33 1732992753

The famous RED tracker has a full on technical interview asking about:

* Audio Formats

* Transcoding

* Spectral analysis

and more.

This is the interview prep website: https://interviewfor.red/en/index.html

pimeys · 2024-12-01T08:36:03 1733042163

Or you get an invite from a friend who always has a bunch of them. Although many people don't realize the expectations and can't cope up with the demands. RED is awesome, but in 2024 it might be hard to start from scratch.

jmb99 · 2024-11-30T05:13:17 1732943597

Was there ever video interviews? Admittedly I wasn’t really paying attention but back when I was getting into what it was only IRC, and these days it still seems to be IRC anywhere that does interviews (otherwise class-restricted forum invites).

jabroni_salad · 2024-11-30T05:43:40 1732945420

I dont recall ever seeing that. I dont think anyone doing piracy wants to be photographed or videoed lol. I did get in mumble with some community members but it was just a hangout.

ggu7hgfk8j · 2024-11-30T09:34:26 1732959266

We are increasingly moving to ID checks. Australia law just now. For all its faults it solves spam as side effect.

ranger_danger · 2024-11-30T17:56:06 1732989366

There are lots of random ID documents available on dark networks however.

qqqult · 2024-11-30T21:07:40 1733000860

It also makes it 100x more likely for you IDs to leak online as KYC companies are valuable targets that get hacked every month

bobsmooth · 2024-11-30T08:49:41 1732956581

A small signup fee is much easier.

grishka · 2024-11-30T13:05:26 1732971926

But it excludes people who don't have easy access to international banking.

3abiton · 2024-11-30T09:23:25 1732958605

I think captchas are just another lind of defense to make it harder for actors abusing the system. It's not a solution, just a little (getting outdated) fortification.

poincaredisk · 2024-11-30T10:01:08 1732960868

Small? From your own link, recaptcha v3 takes 10-15s and costs $1.3 for 1000 captchas. This is actually huge, and cost prohibitively expensive for many things where you would want to use it (like scrapping a large website).

costco · 2024-12-01T20:27:00 1733084820

Depends on the website, but you don't get always get a recaptcha, so the cost is a lot lower than that. You usually get it if you're exceeding some rate limit or you're doing a sensitive action like registering.

RobotToaster · 2024-11-30T08:51:14 1732956674

> so really captchas are more like putting a small min amount of effort.

At that point a proof of work captcha (mCaptcha.org is one, but there are others), is probably the best option. Especially with how any reasonably effective traditional captcha is an accessibility nightmare.

cubefox · 2024-11-30T12:37:44 1732970264

It's completely unclear what a "proof of work" captchas is supposed to be.

jamesnorden · 2024-11-30T16:31:07 1732984267

It's CPU intensive JS code that must run to get an output that must match something server-side, the idea is that it makes attacks/spam not economically viable to run.

hombre_fatal · 2024-11-30T17:29:27 1732987767

The problem is that it doesn’t do anything. Maybe you slightly slow down a volumetric spam attack, but you’re just putting a sleep() before letting spam through which might be the worst solution. As for economic viability, it’s still just a sleep(). Even if it somehow did cost extra money to use more of the CPU, botnets don’t even use their own hardware.

And if you make the PoW so hard that it takes very very long to solve then you basically made a captcha that bots have no problem doing (it’s just time) and humans don’t want to do at all especially on their phone.

porridgeraisin · 2024-11-30T16:30:27 1732984227

Brave search uses it. From my limited understanding, it sends a time-consuming javascript function and its input to your browser, and has your browser calculate the output and send it back. The server matches your output with the expected output. I assume the server would pre-compute in some way? On the spectrum, it leans more towards being a spam-alleviating thing rather than a human-distinguishing thing.

porridgeraisin · 2024-11-30T16:34:48 1732984488

> pre-compute

Or it could be a SAT or something that's easy to verify and hard to solve.

shreyshnaccount · 2024-11-30T17:21:47 1732987307

id think its some kind of proof of sequential work, basically an un-parallelizable calculation that is guaranteed to take a certain number of steps, and making solving thousands of them much harder and hopefully not worth it

marcosdumay · 2024-11-30T19:39:57 1732995597

Almost always it's some variation of "give me a string with the SHA256 hash starting with 0.a471"

nyclounge · 2024-11-30T10:10:17 1732961417

Wow Funcaptcha cost the most and it is open source.

mieko · 2024-11-30T00:23:12 1732926192

If you're into this, here's my 2014 breakdown of the Silk Road CAPTCHA: https://github.com/mieko/sr-captcha

antirez · 2024-11-29T21:43:35 1732916615

Appropriate response by 4Chan to this: simplify the human work given that anyway it's simple to solve via NNs. We are at a point where designing very hard captchas has high probabilities to increase the human annoyance without decreasing the machine solvability.

codetrotter · 2024-11-29T23:52:45 1732924365

> simplify the human work given that anyway it's simple to solve via NNs. We are at a point where designing very hard captchas has high probabilities to increase the human annoyance without decreasing the machine solvability

Or disallow free users to post at all, and require everyone to buy the 4chan Pass for $20 USD per year if they want to post.

https://4chan.org/pass

This is already available to not have CAPTCHA. So if CAPTCHA is totally ineffective, it follows that they should do away with CAPTCHA and free users being able to post at all and everyone should buy the 4chan Pass if they want to post.

fullspectrumdev · 2024-11-30T04:10:39 1732939839

This kills the board. Users will go elsewhere, fuck all people pay for pass.

jachee · 2024-12-01T05:13:32 1733030012

And the spambots will follow them. Which kills the next board. Repeat ad nauseum until the end of the internet.

ranger_danger · 2024-11-30T00:44:30 1732927470

Agreed, charging for accounts is the only halfway viable solution I have seen any service use that gives a sizable downtick in the sheer number of bots/spam.

Of course it's not perfect, and it will still happen, but I have yet to hear any better solutions. Please prove me wrong though!

jcpham2 · 2024-11-30T08:13:24 1732954404

This is known as a Sybil [1] attack and it lays the groundwork for stuff like Adam Backs hashcash [2] protocol and it’s basically why things like proof of work [3] have a monetary value today.

Very chicken and egg this entire field- defending against the spammers while simultaneously operating a “free” system. How to do it without making it prohibitively expensive to join the system…

Any free system will be abused yada yada yada

[1] https://en.wikipedia.org/wiki/Sybil_attack

[2] https://en.wikipedia.org/wiki/Hashcash

[3] https://en.wikipedia.org/wiki/Proof_of_work

poincaredisk · 2024-11-30T10:07:15 1732961235

At this point I have to wait 90 seconds before making every post. (maybe because I don't persist cookies). I posted very rarely, but now I just stopped - I get it when someone shows me the door.

matheusmoreira · 2024-11-30T17:49:04 1732988944

That would work. It would also kill the site.

efilife · 2024-11-30T20:49:18 1732999758

What? So you use 4chan? It would completely kill what makes this website special

OkayBuddy44 · 2024-11-30T00:41:46 1732927306

[flagged]

codetrotter · 2024-11-30T00:58:32 1732928312

As a large language model, I don’t have access to up to date weather information.

Nah. But whatever you’re trying to imply, I think it would make more sense to claim that I am Hiroyuki Nishimura or something.

For now, I’m going to prescribe you 6 months of abstinence from /pol/ and we’ll do another evaluation after that.

kazinator · 2024-11-30T01:40:03 1732930803

Partly irrelevant with a 30% chance of hallucinations.

YeahThisIsMe · 2024-11-30T08:01:53 1732953713

We've been stuck at that point for at least 5, if not 10, years.

hackernewds · 2024-11-29T22:56:11 1732920971

Just use Worldcoin retina scans next

gosub100 · 2024-11-30T18:02:59 1732989779

"Drag each symbol to the group that is most likely to be offended by it."

xp84 · 2024-11-30T23:13:28 1733008408

Ooh I love this, all off-the-shelf AI won’t touch it due to all their “safety” (aka anti-hurt-feelings) protocols

encom · 2024-11-30T08:26:40 1732955200

4chan doesn't care about human annoyance. They just started doing a 15 minute post delay, which is infuriating. I had to whitelist 4chan in Cookie AutoDelete.

matheusmoreira · 2024-11-30T17:55:41 1732989341

Just stop posting there. The whole point of it is to post anonymously in a high traffic forum. The rate limiting timers have reduced traffic to the point many boards feel dead, and their solution to that problem is to sell accounts.

poincaredisk · 2024-11-30T10:10:03 1732961403

Hi fellow cookie autodeleter, I experienced the same thing, but I just decided to stop posting. Whitelisting felt too much like giving in to terrorists. I'm considering just not going there in the future. Maybe after all this time I will finally be free.

Arnavion · 2024-11-30T17:41:17 1732988477

Same. In my case I always use a separate incognito mode browser for posting and a regular locked-down browser with JS disabled etc. So I'd have to either give in and leave the incognito mode browser running in the background while I browser on the main browser, or give in and stop blocking as aggressively on the main browser, and I chose to do neither and just stop posting.

Given the schizos that are still present and drowing out the conversation in half the threads I read, there wouldn't be a point to posting anyway.

ValentinA23 · 2024-11-30T23:16:47 1733008607

Why are you doing this ? What are you trying to avoid ?

encom · 2024-11-30T14:35:09 1732977309

See you tomorrow, anon.

hsbauauvhabzb · 2024-11-30T00:48:06 1732927686

What is NN?

numpad0 · 2024-11-30T05:27:54 1732944474

"AI" but pre-COVID

marcosdumay · 2024-11-30T19:42:28 1732995748

Oh my!

Is the oversimplification from "deep neural network" into "AI" caused by the prevalence of brain-fog due to long COVID?

layer8 · 2024-11-30T01:02:52 1732928572

https://en.wikipedia.org/wiki/Neural_network_(machine_learni...

brodo · 2024-11-30T10:05:40 1732961140

I am totally in favor of increasing the annoyance of 4chan users.

somat · 2024-11-30T03:55:18 1732938918

I wonder if it would be better to pretend to have a captcha but really you are analysing the user timing and actions. Honestly I half suspect this is already going on.

If you wanted to go full meta "never go full meta" you would train a AI to figure out if the agent on the other side was human or not. that is, invent the reverse turing test. it's a human if the ai is unable to differentiate it's responses from normal humans responses. as opposed to marketing human responses.

Well now I have to go have a lay down, I feel a little ill from even thinking on the subject.

wraptile · 2024-11-30T04:30:48 1732941048

That's kinda what every major captcha distributor does already!

Even before captcha is being served your TLS is first fingerprinted, then your IP, then your HTTP2, then your request, then your javascript environment (including font and image rendering capabilities) and browser itself. These are used to calculate a trust score which determines whether captcha will be served at all. Only then it makes sense to analyze captcha's input but by that time you caught 90% of bots either way.

The amount your browser can tell about you to any server without your awareness is insane to the point where every single one us probably has a more unique digital fingerprint than our very own physical fingerprint!

encom · 2024-11-30T08:22:44 1732954964

This is how ClownFlare and its ilk, make life hell on the internet, when you use a "weird" browser on a "weird" OS.

jeroenhd · 2024-11-30T08:37:55 1732955875

My experience is that IP reputation does a lot more for Cloudflare than browsers ever did. I tried to see if they'd block me for using Ladybird and Servo, two unfinished browsers (Ladybird used to even have its own TLS stack), but I passed just fine. Public WiFi in restaurants and shared train WiFi often gets me jumping through hoops even in normal Firefox, though.

I can't imagine what the internet must be like if you're still on CG-NAT, sharing an IP address with bots and spammers and people using those "free VPN" extensions donating their bandwidth to botnets.

gosub100 · 2024-11-30T18:00:32 1732989632

Re: your last paragraph, https://coveryourtracks.eff.org/

EFF have been running this for years. Gives an estimate about how many unique traits your browser has. Even things like screen resolution are measured.

zoltrix303 · 2024-11-30T06:01:06 1732946466

Would it be possible to serve a fake fingerprint that appears legitimate? Or even better mimic the finger print of real users who've visited a site you own for example?

wraptile · 2024-12-02T15:59:56 1733155196

Yes, that's what web scraping services do (full disclaimer I work at scrapfly.io). Collecting fingerprints and patching the web browser against this fingerprinting is quite a bit of work so most people outsource this to web scraping APIs.

nullpt_rs · 2024-11-30T06:21:09 1732947669

yep, but it can get tricky.

some projects worth checking out: https://github.com/refraction-networking/utls https://github.com/berstend/puppeteer-extra

saagarjha · 2024-12-01T09:37:35 1733045855

Unrelated, but who runs this account?

barbolo · 2024-11-30T10:14:30 1732961670

https://github.com/lwthiker/curl-impersonate

PUSH_AX · 2024-11-30T08:28:54 1732955334

In that case why do I ever receive a captcha?

Pikamander2 · 2024-11-30T09:30:19 1732959019

It adds another layer of analysis. For example:

If the user solves the CAPTCHA in 0.0001 seconds, they're definitely a bot.

If the user keeps solving every CAPTCHA in exactly 2.0000 seconds, each time makes it increasingly likely that they're a bot.

If the user sets the CAPTCHA entry's input.value property directly instead of firing individual key press events with keycodes, they're probably either a bot, copy-pasting the solution, or using some kind of non-standard keyboard (maybe accessibility software?).

Basically, even if the CAPTCHA service already has a decent idea of whether the user is a bot, forcing them to solve a CAPTCHA gives the service more data to work with and increases the barrier of entry for bot makers.

sdk16420 · 2024-11-30T14:29:04 1732976944

I found several websites switched to 'press here until the timer runs out', probably they are doing the checks while the user is holding their mouse pressed, it would be trivial to bypass the long press by itself with automated mouse clickers.

kccqzy · 2024-11-30T04:06:53 1732939613

That's what reCAPTCHA does.

benreesman · 2024-11-30T08:42:23 1732956143

In my opinion the granddaddy of all 4chan CAPTCHA busts is still Yannick Kilcher’s GPT-J tune on “Raiders of the Lost Kek” set, and might be the coolest thing an LLM has ever done on video: https://youtu.be/efPrtcLdcdM?si=errY0PrEhnX9ylDw

chiph · 2024-11-30T16:16:25 1732983385

Nearly a full minute of disclaimers and warnings about 4chan. That's got to be a record.

ValentinA23 · 2024-11-30T23:36:04 1733009764

>I released the model, the code and I evaluated the model on a huge set of benchmarks and it turns out this horrible, terrible, model is more truthful-yes more truthful-than any other GPT out there

Pikamander2 · 2024-11-30T09:15:13 1732958113

> The official TensorFlow-to-TFJS model converter doesn't work on Python 3.12. This doesn't seem to really be documented.

> TensorFlow.js doesn't support Keras 3.

I tried getting into some casual machine learning stuff a few years ago and more or less gave up because of stuff like this. It was staggering how many recent tutorials were already outdated, how many random pitfalls there were, and how many "getting started" guides assumed you were already an expert.

sigmoid10 · 2024-11-30T09:34:41 1732959281

As someone who has been working in ML for years, I can only recommend to stay away from anything recent. Grab an old bayesian statistics textbook and learn the fundamentals, then progress to learning the major frameworks like Pytorch. Try to write every part of a CNN, RNN and Transformer architecture and training pipeline yourself the first time (including data loaders, but maybe leave out CUDA matrix kernels). Stay the hell away from wrappers for other people's wrappers like Langchain. Their documentation is often not just outdated, but flat out wrong regarding the fundamentals. Huggingface is great if you know the basics and thus how to fix things if their standard wrappers break.

rohansuri · 2024-11-30T10:39:13 1732963153

Any book you would recommend?

sigmoid10 · 2024-11-30T16:30:32 1732984232

You can try Theodoridis if you can find a first or second edition. It is old enough to not be diluted by the recent craze but still recent enough to cover all the necessary fundamentals. There is also a new edition coming out soon, but that seems to have been heavily tainted by the ChatGPT hype.

ChrisMarshallNY · 2024-11-29T22:00:48 1732917648

That’s like spending a few hours, learning to take the lid off your septic tank.

blackjackfoe · 2024-11-29T22:16:08 1732918568

Little bit, but at least you learned something :)

gherkinnn · 2024-11-30T11:52:51 1732967571

Oddly enough, I find most of 4chan less brainrot inducing than Twitter, even pre-Musk.

JasserInicide · 2024-11-30T15:37:20 1732981040

It's still brainrot, it's just on the opposite end of the political spectrum.

irusensei · 2024-11-30T22:36:00 1733006160

Back when Llama was leaked on /g/ 4chan's /g/lmg/ was the best place to be up to date with local models. It still might be but not so much.

People think 4chan is just /pol/ when in fact more boards exist and their users don't really appreciate when /pol/ leaks into their threads.

tovej · 2024-11-30T14:32:45 1732977165

There's no smart algorithm for sorting posts, and there's a limited number of active threads, so it's not rage baiting in quite the same way. Only active threads stay alive though, so it has the exact same issue as twitter and other social media, only engaging content is served to users, and the most engaging things are rage bait, conspiracy theories, and porn. Things that get someone riled up enough to respond.

thrance · 2024-11-30T12:01:30 1732968090

I have bad news for you, then...

Cumpiler69 · 2024-12-01T07:50:11 1733039411

What's the bad news?

meowface · 2024-11-30T17:22:34 1732987354

I am a liberal and also genuinely find many 4chan boards less politically awful than current Twitter most of the time.

The chronological sorting at least offers some diversity of opinion. The first 50 replies to a 4chan thread about Trump (in the right board) will usually contain many, maybe even mostly, anti-Trump posts. On Twitter you usually need to scroll through the sea of blue checkmark replies for a while to find even one anti-Trump post.

Some 4chan boards are majority neo-Nazis who want all minorities expelled or murdered. But stumble across a particular Twitter thread and it's the same thing but with even more ideological uniformity within the thread, and with 4000 neo-Nazis in the thread instead of 60.

That said, both sites definitely are not great to use if you aren't very right-wing.

salawat · 2024-11-30T00:01:01 1732924861

...Don't underestimate the things to be learned studying a septic system.

morkalork · 2024-11-29T22:01:47 1732917707

Following the links to the captcha solving service you can read profiles of the humans doing the work where its pitched as more ethical than them working in hazardous factories!

Alifatisk · 2024-11-30T13:53:33 1732974813

If there is one blog I've fell in love it, it's nullpt.rs. Still waiting for part 2 of Reverse Engineering Tiktok's VM Obfuscation

makifoxgirl · 2024-11-29T22:33:49 1732919629

This project also solves the 4chan captcha https://github.com/moffatman/chan

tumsfestival · 2024-11-29T21:30:49 1732915849

I can only imagine how much worse they'll make the captcha after stuff like this picks up speed with the users all the while being ineffective against the bots.

rany_ · 2024-11-29T21:34:55 1732916095

I really doubt that they're the first to do this.

OmarShehata · 2024-11-29T22:00:43 1732917643

captchas are broken, forever. There is no way to prevent bots without also preventing a bottom tier of human users (visually impaired people, old people, or just impatient people). Like this xkcd [1] comic suggests, we need to just focus on rewarding and punishing specific behavior, regardless of whether the agent is human or not

[1] https://xkcd.com/810/

shortrounddev2 · 2024-11-30T00:22:36 1732926156

Jokes aside, we don't want any bots at all. Even if they're posting constructive comments, we should interact with humans, not machines

hsbauauvhabzb · 2024-11-30T00:51:41 1732927901

That doesn’t mean that webcrawlers have no legitimate value (think: search indexers) or illegitimate value (think: intellectual property theft via data scraping for AI purposes), and bots which communicate while they have no place, aren’t going to go away.

Philpax · 2024-11-30T07:47:36 1732952856

In the interest of provoking discussion: why?

If a bot can meaningfully pass and act as a productive member of the community, what does it matter?

matheusmoreira · 2024-11-30T18:07:27 1732990047

Because some of us go to sites like 4chan in order to learn what people really think. We want to see how they react and what they say when they are protected from consequences by the anonymous nature of the forum. We want the full spectrum of humanity, good and bad.

The opinions of bots are not just irrelevant, they are a form of consensus creation attack. They make it seem like a lot of people have an opinion when the reality might be the opposite. We are not interested in the made up realities that people pay bot operators to create. We want the truth, and the truth comes from real humans expressing their real unfiltered thoughts.

fragmede · 2024-11-30T18:34:01 1732991641

It's nice to want things. The people paying expensive programmers for bot armies to parrot their thoughts are currently paying cheaper humans sitting at a bank of beheaded cellphones to parrot amplify their thoughts instead. You're being lied to, regardless, the only difference is if it's a shell script to do the lying or a paycheck to a human to do the lying.

Who's driving phone farm?

https://www.some3c.com/blogs/news/unified-control-20-pcs-pho...

matheusmoreira · 2024-11-30T21:07:26 1733000846

I'm aware of the risk. I try to mitigate it by also browsing smaller sites which are hopefully too small to be targeted by people with vested interests. And I know I'm being lied to. That's why I want to see every lie, every extreme. I'm especially interested in witnessing them try to debunk each other's lies. In the chaos, a synthesis is bound to emerge.

Because in the end it's up to us. We're the ones who have to draw the conclusions. At some point we're gonna have to decide whether some idea is right or wrong. This is much harder compared to just blindly taking a side at face value and just believing them and repeating what they say. I suppose it's possible that most people would prefer to be told what to think and what to say. I for one can't live like that. Things gotta make sense before I'll believe in them.

It's important to witness every possible argument and to see every single one of them viciously attacked on the proverbial ideological battleground. Then you can figure out which points remain convincing. Declaring oneself right, unwillingness to engage in debate, attempts to suppress opposing viewpoints, emotional appeals, these are all signs of authoritarianism. This is reason enough to cast everything they say into doubt. Good ideas don't need to be forced in this manner in order to convince.

shortrounddev2 · 2024-12-01T03:49:40 1733024980

Because I'm interested in what people have to say, not what machines have to say.

echelon · 2024-11-30T00:15:07 1732925707

I think a better approach is to make account creation frictionful (eg. charge money, set karma thresholds, require an invite, etc.), score each account, and ban or time out accounts when they break community rules.

But an even better approach would be to go fully P2P and leave the scoring and ranking and filtering at the end nodes, with the possibility of friendly networks of interest group peers assisting with the task. BitTorrent for social media, pgp signed accounts, fully flexible annotation and ingestion. It's also less subject to cabal-based censorship.

webstrand · 2024-11-30T02:32:51 1732933971

PoW like hashcash (not a cryptocurrency thing) might be a better solution. Users could even delegate solving the PoW puzzles to a 3rd party for low power devices like phones. But it imposes a cost on spammers that's inescapable.

jeroenhd · 2024-11-30T08:45:19 1732956319

That assumes spammers are using their own hardware to post. If they're using a botnet, they don't care about CPU cycles. Botnets would probably become even more profitable in that model.

cchance · 2024-11-29T21:59:31 1732917571

I mean at some point ... the average visitor is dumber than the AI and your now just blocking dumb people

OmarShehata · 2024-11-29T22:01:32 1732917692

yes, we're creating websites that are gated by IQ tests. This isn't the way

hsbauauvhabzb · 2024-11-30T01:08:34 1732928914

I’d like to believe I have at least an average IQ and I can’t pass half the google captchas.

Whether or not a square is part of the motorbike when it’s either the rider or a few pixels of the wheel is subjective and fuzzy. Fuck google for not making these questions clear cut enough that answers aren’t disputable.

djbusby · 2024-11-29T23:00:34 1732921234

*you're

ranger_danger · 2024-11-30T00:41:09 1732927269

For those that don't know, the JKCS extension has been doing this for years already:

https://addons.mozilla.org/en-US/firefox/addon/jkcs/

https://chromewebstore.google.com/detail/joshi-koukousei-cap...

Userscript version: https://github.com/drunohazarb/4chan-captcha-solver

blackjackfoe · 2024-11-30T00:47:40 1732927660

I really hope my post didn't come off as if I was trying to make it sound like this was a new idea. Regardless, this is good information, because it counters the posts of the form "great, now that you made this, you're going to make it harder."

ranger_danger · 2024-11-30T17:44:48 1732988688

I didn't look at it that way, just maybe that you (and/or others) might not have been aware of its existence since I didn't see it mentioned anywhere.

Yeul · 2024-11-30T11:25:41 1732965941

I understand why Cloudflare has to exist. But its beyond annoying that it forces you into using an unmodified Chrome sans VPN.

hobom · 2024-11-29T23:47:53 1732924073

Does 4Chan also have bot BEHAVIOR detection (e.g. unnatural mouse movements)that google captcha has?

blackjackfoe · 2024-11-30T04:57:09 1732942629

It does not, at least not once you pass the Cloudflare Turnstile challenge (which can be done with an API as well.)

ipnon · 2024-11-30T04:55:48 1732942548

The results here suggest it does not.

kalleboo · 2024-11-30T03:44:42 1732938282

Yeah I had been under the impression that the point of captchas like this (and those "slide a puzzle piece" ones) weren't the solution to the problem as much as checking for human-like mouse movements.

chad1n · 2024-11-29T22:41:28 1732920088

I've built 3 iterations of captcha solvers for that crappy website based on https://github.com/drunohazarb/4chan-captcha-solver/issues/1 . The only thing I've learned along the way is that it's mostly pointless outside of a "learning" exercise, since they'll change the captcha (in terms of letter count or the entropy background). Initially, it was 4 characters with pretty obvious background, then it turned to 5, then it was both 4 and 5 and the current iteration which is also either 4 or 5, but with a lot of entropy surrounding the characters.

blackjackfoe · 2024-11-29T22:48:53 1732920533

This project was really my first decent introduction to computer vision and machine learning (along with that of those who helped me in various ways; none of them desired to be credited here other than the guy who collected some of the data for me.)

It was definitely a successful learning exercise, and it's made me more confident tackling some other problems I've had in mind for awhile.

spookie · 2024-11-30T03:38:30 1732937910

To help you out if you're interested:

- a smeared gaussian in one axis and another in another axis can really help segmenting chars, finding lines of text in OCR

- You can unshear chars using the Radon or Hough transform as a basis to understand the angle

Went through MNIST a few weeks ago and I agree it's interesting!

blackjackfoe · 2024-11-30T03:43:40 1732938220

I am always interested! Thank you for the tips, I'll definitely research these.

sorenjan · 2024-11-30T13:28:11 1732973291

Shearing is a linear operation that should be trivial for a NN to learn. Have you found that unshearing is actually useful? Was it to feed the image to an existing OCR program?

normie3000 · 2024-11-29T23:04:42 1732921482

How did this project help you to learn computer vision? I'd also like to write a basic captcha solver as an intro, but superficially this project just looks like a dump of generated code.

blackjackfoe · 2024-11-29T23:06:36 1732921596

What do you mean by "generated code"? All of the code in the linked GitHub repo was written by me, with the assistance of a couple friends who helped here and there, but didn't request to be credited.

I learned a lot because I had to do a ton of research and experimentation (fancy word for trial-and-error) to write the code and have it work as I expected.

normie3000 · 2024-11-30T06:54:16 1732949656

I think there's been a misunderstanding. I didn't understand you were the author of the linked article, and read the following exchange to mean you'd found the code at https://github.com/drunohazarb/4chan-captcha-solver to be a helpful introduction:

> > I've built 3 iterations of captcha solvers for that crappy website based on https://github.com/drunohazarb/4chan-captcha-solver/issues/1

> This project was really my first decent introduction to computer vision and machine learning

I see now that your code is linked from the article, and looks really informative - thanks for sharing!

throwaway314155 · 2024-11-30T03:12:15 1732936335

[flagged]

normie3000 · 2024-11-30T06:56:34 1732949794

I'm not sure this is helpful - please see my other reply.

From https://news.ycombinator.com/newsguidelines.html

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

bryan0 · 2024-11-29T22:47:12 1732920432

In the article it mentions they changed the number of characters in the captcha after he trained the model, and the model could still solve it