Hacker News new | past | comments | ask | show | jobs | submit login
G-WAN Captcha Decode (gist.github.com)
80 points by samuirai on June 14, 2012 | hide | past | favorite | 50 comments



This is something I find absolutely bizarre: for every person who is willing to make ridiculously exaggerated claims about something, there are thousands willing to believe them. One example which springs to mind is the "spam free wordpress" plugin[0], which claims to use "anonymous password authentication to block 100% of all comment spam with zero false positives."

As it turns out, "anonymous password authentication" means that it gives you a randomly generated string which you have to copy and paste into another box each time you submit a comment. It sounds like something a machine could do because it is something a machine could do with ease. And yet nobody seems to have noticed this for two reasons:

1) The author deletes all the comments on his blog questioning why the system works

2) It has a 4.5 star rating on the WordPress plugin DB

As a result, the author is still making ridiculously exaggerated claims about the capability of his system, like "If Gawker had been using the anonymous password authentication built into Spam Free WordPress this incident [the Gawker break-in in 2010] would not have happened." Another gem is "CAPTCHA is not used because it is hard to read, unnecessary, easily cracked, and reduces the number of real comments substantially."

So there it is, another snake-oil salesman spreading FUD and making users (of some very popular websites[1]) suffer.

[0]: http://www.toddlahman.com/spam-free-wordpress/ [1]: http://www.raspberrypi.org/


> [1]: http://www.raspberrypi.org/

If I had not witnessed that, I would never believe that it happened. The Raspberry Pi team fell for something that any kid who can write a python script should have known had the utility of a voodoo incantation? I'm floored.


On the bright side, web server performance, reliability, and maintenance are much more concrete than "amount of spam prevented", and general internet meritocracy has worked well towards ensuring G-WAN's lack of adoption.


Pierre (the sole G-WAN author) says some funny things. He also defends G-WAN using dummy accounts all over the internet (StackOverflow, Reddit, Wikipedia, etc). You can respect his coding knowledge if you can force yourself to ignore his high claims and self-importance. But good job tearing the captcha apart!


> You can respect his coding knowledge if you can force yourself to ignore his high claims and self-importance.

Still seems totally insane to me. The extent of his features are things like "DNS lookups are asynchronous in all libraries".


I was looking at gwan with some friends a year ago, at that time it was presented as a C-script-page server. The idea of scripting with C was intriguing. But then, it just didn't work, unstable and the author claiming to be a victim of some odd spy-movie-like conspiracy.. that was really weird.. and funny somehow.

And the company name "TrustLeap" ... seriously ? http://trustleap.ch disappeared by the way.. here you go http://web.archive.org/web/20110707004226/http://www.trustle...

http://web.archive.org/web/20091028041609/http://trustleap.c...

http://web.archive.org/web/20110707004059/http://www.trustle...

Now it looks that they redesigned the site, changing target and adding new buzz words here and there (removing the obsolete buzz no one is using anymore)

see also http://news.ycombinator.com/item?id=2776927

And to be honest this king of restriction on a web server, sounds crazy:

=== “Server identification field” means the field in the response header which contains the text “Server: G-WAN/x.x.x” where “x.x.x” is the program version number.

You agree not to remove or modify the server identification field contained in the response header. ===

Funny stuff!


Well done.

Except, I'm pretty sure the gwan guys are just a very sophisticated group of trolls.

I'm going to go out on a limb and say if you test their web server yourself you'll find that their claims are false, and that the reason it's closed source is because the joke would be too obvious if we could simply take a quick look.

In any event this is all hilarious.

Now, I hope someone with the time and curiosity will do this.. and publish some actual benchmarks of gwan so this controversy can finally end.


In the last GWAN thread[1] a user[2] who said they have been using it for a year claimed it was all quite true. The intimation in this thread is that was actually the author himself sockpuppeting. Indeed, the account has only had 3 comments in the last year[3] until the whole GWAN thing, which is certainly suspicious, though certainly no proof.

[1] http://news.ycombinator.com/item?id=4109698

[2] http://news.ycombinator.com/user?id=ers35

[3] http://news.ycombinator.com/threads?id=ers35


If you are interested in benchmarks of G-WAN, see [1].

To run a benchmark yourself, see [2].

[1]http://gwan.ch/benchmark

[2]http://gwan.ch/source/ab.c


The more I see of 'original' captchas, the more I think people should be taking the same attitude towards captchas as they do to cryptography. Just use a well known library that has proved to be hard to break by the test of time and heavy use.


Like ReCaptcha? Man, I was signing up for something last night and failed to solve it 3 times. Captchas are getting harder for humans to solve than bots - is this really a good thing?


(is your ReCaptcha reference to it being broken? If so then I think my point is proved by how quickly it was fixed)

As for making it harder for humans, I very much see your point, but the solution isn't to come up with some of the trivial captchas that many come up with by themselves.


I came to the belief this is the goal to make human captcha factories more expensive.


The price for human-solving 1000 captchas is $2. You get ~98% accuracy. There is no secure captcha library.

Captchas as a concept are flawed and should be replaced by something that $works. (Don't ask me what that could be - I have no idea.)


Oh, I agree. But I'd argue a working captcha is at least somewhat preferable to a broken poorly made one, even if they can both be trivially mechanically-turked.

Sadly the only solution I can see to the overall 'Captchas are broken' problem that's current available is forcing people to link to an established identity like a Google account or a Facebook account. This then of course recurses to how can you prevent automated Google/Facebook sign ups. I wonder whether Google/Facebook could use some kind of heuristic for detecting genuine users of the service? (maybe a Facebook account that plays games or uploads photos regularly, or has attended a few events could be a threshold?)


I keep waiting to see them replaced with proofs-of-work.


[Reposted from http://news.ycombinator.com/item?id=4113609]

That example is easy to solve because it is not using any of the provided techniques that makes it more difficult for robots to solve the CAPTCHA: "changing the HTML background color based on: mouse cursor hovering, previous state or input or shared secret"

The purpose of the example is to give you a basis on which you could implement an effective CAPTCHA.

The claim of "difficult or even completely impossible for robots" applies to CAPTCHAs using the above techniques, which are not used in the example.


So you're implying that if you changed the HTML colors based on mouse events, the claim of "difficult or even completely impossible for robots" would hold?

Such a claim would be no less ludicrous.

I'd put my money on there being hundreds of people on Hacker News alone that could script a DOM-monkeying cracker for such a system. That runs with ~100% accuracy. And in under an hour of coding.

This Captcha strategy is so absolutely terrible in light of modern libraries that I'm honestly shocked you feel the need to defend it.


The best way for me to solve this dispute is to implement a CAPTCHA using some of the proposed techniques. If I fail, then I was mistaken. If I succeed, then maybe people will given G-WAN a try.


Still don't get it


The CAPTCHA example is incomplete. It is missing the part that actually makes it difficult for robots to solve. The purpose of the example was to provide a list of ways to implement an effective CAPTCHA.


Then why are you claiming you've done _anything_. There is literally _nothing_ new here - you're telling me stateful web apps are your invention?


I am not the author of G-WAN. I am not claiming to have done anything.

This whole thread is based on a misunderstanding. G-WAN does not have any CAPTCHA support. The example merely demonstrates how to use G-WAN's image generation API.

The CAPTCHA example is not meant to be used without modification in an application. CAPTCHAs are not a feature of G-WAN.

No where did anyone claim to have invented stateful web applications.


Reading through 2 threads I'm convinced your the author. Why not just admit it?



"Proof"? When I was 20 years younger, and the internet was fresher, I had several handles in Usenet, obviously inspired by Ender's Game.


can you show me an example. Because I can't think of one.


I do not have an example handy, so hopefully an explanation is enough.

The CAPTCHA example has two parts: generating the CAPTCHA values and presenting them in a way that is easy for humans to solve, but difficult for computers.

This example only implements part one, but gives some methods of how one could implement part two. This is why it is so easy for you to write a script to solve it.

Using mouse cursor hovering, one could change the background color behind the image based on an event (such as a mouseover of an input field) that is probable for a human to make, but less so for a computer. This is difficult for a computer to solve, but not impossible to break.

G-WAN applications are persistent, meaning it is trivial to record stateless actions the client has made in the past for use in the future. The client would have to provide a known good value from the past in order to solve the CAPTCHA. This is what is meant by "previous state or shared secret".


> Using mouse cursor hovering, one could change the background color behind the image based on an event (such as a mouseover of an input field) that is probable for a human to make, but less so for a computer. This is difficult for a computer to solve, but not impossible to break.

How would this reveal the CAPTCHA value to a human but not to a computer? If the string is readable over only some background colors, then it's written on a transparent-backed image, so the whole background-changing script can be ignored. Just OCR the image with the transparent background.

If you're suggesting the CAPTCHA itself be the movement of the mouse over specific inputs, rather than deciphering a string, then this is trivial to break as well. The code that watches the mouse events and does whatever it does to indicate human-ness has to be written in JavaScript and transmitted to the browser... which means it's sitting right there to be analyzed and copied by the bot author. They don't need to replicate the mouse movement, just trigger the same code the correct movement triggers.

Requiring previous state adds nothing to the test either. If a human has to visit a certain sequence of pages before submitting a form, the bot can make the same sequence of HTTP requests and replay the same cookies or however you track the state.

It doesn't sound like you know what you're talking about.


> If the string is readable over only some background colors, then it's written on a transparent-backed image, so the whole background-changing script can be ignored. Just OCR the image with the transparent background.

The CAPTCHA image contains false values for misdirection, hence the requirement of running the background changing script.

Here is a list of claims made on the example CAPTCHA page:

  BG                     Ability
  color                  to
  generation             solve (as defined by the author)
  ----------             ---------
  mouse cursor hovering  difficult
  previous state         impossible
  input
  shared secret  
I have no way to know precisely what the author means by those words since the given example does not demonstrate any of them. I cannot match the left and right columns.

For example, if I claimed that a CAPTCHA based off a "shared secret" only a trusted user has is impossible to solve, then one might ask the question, "Why is a CAPTCHA needed for a trusted user?". Is this what the author of the CAPTCHA example meant? I do not know.

My only purpose of posting in this thread was to show that the OP's premise that "this is the best example of how not to implement it." is flawed because no such example was given in the first place.


Nothing about that sounds unique from existing CAPTCHA systems, nor does it sound hard to break. You're going to use CSS or Javascript to "hide" it from a computer? That uhm, doesn't make much sense (to me).


Not to mention something like Sikkuli (in JPython) can actually make your mouse hover and record values...

Plus how is that easy for humans to solve? 1. They need to know they can put their mouse over the text 2. Then record the text 3. Sum the numbers 4. Write the result...

If I had to do this every time I wanted to login to a site I'd seriously consider switching. And this is the 'best' captcha impossible to break - for humans. Machines can solve it easily.


I didn't read through all his code...but wouldn't OP's approach only work if they used the same GIF over and over?

In any case, it seems trivially easy to break. Just capture the image. Read the background color value. Generate the image (with the background color) in ImageMagick and run through your OCR of choice. Obviously, that's not the fastest way to do it if you're trying to do thousands of attempts at once, but it's the least brainpower-involved.


No, he just hard-coded the number shapes. As long as the numbers used the same font as the example (and don't overlap), it should work just fine.

OCR would probably be more robust in general (for varying fonts and number shapes)... but it's simply absurd to call G-WAN's scheme a better captcha. More obscure and less targeted? Perhaps.


I thought about using OCR - just to over-engineer it. But I wanted to show how the characters are perfectly aligned and how clear the font is. I would like to understand, what he thought, why this Captcha is so special.


I need two more points to make up for the haters.

Come find us?


Both you and the OP have misunderstood the purpose of the CAPTCHA example. See my post[1] for an explanation.

[1]http://news.ycombinator.com/item?id=4113826


Why don't you identify yourself as the author?


I am not the author. For "proof", see my many conversations on the G-WAN forum.[1]

[1]http://forum.gwan.com/index.php?p=/profile/discussions/1053/...


I don't think he is. I read the G-WAN authors replies. The author doesn't have as good grasp of English and he sounds way crazier.


I haven't ran the code, but I did completely read it.

It appears to me that he maps the character set GWAN uses for the captcha system, and as such, should work for any image generated by GWAN using the same character set as it simply identifies pixels matching the character set.


I think this algorithm Captcha is optional. I'm using G-WAN, and I implemented another CAPTCHA algorithm in C. I like it when someone proves that something is unsafe. It's inspiring to improve our applications.


I've never even heard of G-WAN until now, so I guess it's working..


G-Wan is like the Great Old ones. It's time to forget it again...


How about using NoCaptcha technique?


All the people who make something fantastic in some part were crazZzyYy

But G-wan author i think need chillout and start re-thinking

G wan should be OK for eq CDN

For hosters * need options to turn off script language * Support modules like in Apache * Support .httacess For me * Add fcgid but something better and more faster its posibble. Show that PHP on gwan can be balizing fast * add modules session mongodb redis etc * should be OK replacement for apache * recode version for windows for developer

Go to some IT conference run 2 machine NginX full optimized for speed + Gigagbyte network and show people how its work CPU usage request per second to compare


For Fabian Fäßler, from DC949 chat:

  <@merlin_> do we know this guy?
  ... snip ...
  <@merlin_> """Today I had my first lightning talk at #BerlinSides_0x3"""
  <@Kos> OH YEAH
  <@Kos> that adude started following me a few weeks ago
  <@C-Ps> berlin sids is pretty slick
  ... snip ...
  <@Kos> I probably  meat that dude at berlinsides
  <@Kos> erm
  <@Kos> met
  <@savant42> meat, eh?
   ... snip ...
  <@C-Ps> do you often meat men in berlin?
Thank you for providing the lulz, as well as the link to stiltwalker!


Haters gonna hate! At least he replied above -- mission accomplished.


I feel a bit observed and I don't know whether I should feel fear or joy... I'm confused.


It was appreciation for your work and your link to ours, as well as an invitation. So I would go more on joy, unless of course you are afraid of hackers that like to drink and break things, not necessarily in that order, but often.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: