This is something I find absolutely bizarre: for every person who is willing to make ridiculously exaggerated claims about something, there are thousands willing to believe them. One example which springs to mind is the "spam free wordpress" plugin[0], which claims to use "anonymous password authentication to block 100% of all comment spam with zero false positives."
As it turns out, "anonymous password authentication" means that it gives you a randomly generated string which you have to copy and paste into another box each time you submit a comment. It sounds like something a machine could do because it is something a machine could do with ease. And yet nobody seems to have noticed this for two reasons:
1) The author deletes all the comments on his blog questioning why the system works
2) It has a 4.5 star rating on the WordPress plugin DB
As a result, the author is still making ridiculously exaggerated claims about the capability of his system, like "If Gawker had been using the anonymous password authentication built into Spam Free WordPress this incident [the Gawker break-in in 2010] would not have happened." Another gem is "CAPTCHA is not used because it is hard to read, unnecessary, easily cracked, and reduces the number of real comments substantially."
So there it is, another snake-oil salesman spreading FUD and making users (of some very popular websites[1]) suffer.
If I had not witnessed that, I would never believe that it happened. The Raspberry Pi team fell for something that any kid who can write a python script should have known had the utility of a voodoo incantation? I'm floored.
On the bright side, web server performance, reliability, and maintenance are much more concrete than "amount of spam prevented", and general internet meritocracy has worked well towards ensuring G-WAN's lack of adoption.
Pierre (the sole G-WAN author) says some funny things. He also defends G-WAN using dummy accounts all over the internet (StackOverflow, Reddit, Wikipedia, etc). You can respect his coding knowledge if you can force yourself to ignore his high claims and self-importance. But good job tearing the captcha apart!
I was looking at gwan with some friends a year ago, at that time it was presented as a C-script-page server. The idea of scripting with C was intriguing. But then, it just didn't work, unstable and the author claiming to be a victim of some odd spy-movie-like conspiracy.. that was really weird.. and funny somehow.
Now it looks that they redesigned the site, changing target and adding new buzz words here and there (removing the obsolete buzz no one is using anymore)
And to be honest this king of restriction on a web server, sounds crazy:
===
“Server identification field” means the field in the response header which contains the text “Server:
G-WAN/x.x.x” where “x.x.x” is the program version number.
You agree not to remove or modify the server identification field contained in the response header.
===
Except, I'm pretty sure the gwan guys are just a very sophisticated group of trolls.
I'm going to go out on a limb and say if you test their web server yourself you'll find that their claims are false, and that the reason it's closed source is because the joke would be too obvious if we could simply take a quick look.
In any event this is all hilarious.
Now, I hope someone with the time and curiosity will do this.. and publish some actual benchmarks of gwan so this controversy can finally end.
In the last GWAN thread[1] a user[2] who said they have been using it for a year claimed it was all quite true. The intimation in this thread is that was actually the author himself sockpuppeting. Indeed, the account has only had 3 comments in the last year[3] until the whole GWAN thing, which is certainly suspicious, though certainly no proof.
The more I see of 'original' captchas, the more I think people should be taking the same attitude towards captchas as they do to cryptography. Just use a well known library that has proved to be hard to break by the test of time and heavy use.
Like ReCaptcha? Man, I was signing up for something last night and failed to solve it 3 times. Captchas are getting harder for humans to solve than bots - is this really a good thing?
(is your ReCaptcha reference to it being broken? If so then I think my point is proved by how quickly it was fixed)
As for making it harder for humans, I very much see your point, but the solution isn't to come up with some of the trivial captchas that many come up with by themselves.
Oh, I agree. But I'd argue a working captcha is at least somewhat preferable to a broken poorly made one, even if they can both be trivially mechanically-turked.
Sadly the only solution I can see to the overall 'Captchas are broken' problem that's current available is forcing people to link to an established identity like a Google account or a Facebook account. This then of course recurses to how can you prevent automated Google/Facebook sign ups. I wonder whether Google/Facebook could use some kind of heuristic for detecting genuine users of the service? (maybe a Facebook account that plays games or uploads photos regularly, or has attended a few events could be a threshold?)
That example is easy to solve because it is not using any of the provided techniques that makes it more difficult for robots to solve the CAPTCHA: "changing the HTML background color based on: mouse cursor hovering, previous state or input or shared secret"
The purpose of the example is to give you a basis on which you could implement an effective CAPTCHA.
The claim of "difficult or even completely impossible for robots" applies to CAPTCHAs using the above techniques, which are not used in the example.
So you're implying that if you changed the HTML colors based on mouse events, the claim of "difficult or even completely impossible for robots" would hold?
Such a claim would be no less ludicrous.
I'd put my money on there being hundreds of people on Hacker News alone that could script a DOM-monkeying cracker for such a system. That runs with ~100% accuracy. And in under an hour of coding.
This Captcha strategy is so absolutely terrible in light of modern libraries that I'm honestly shocked you feel the need to defend it.
The best way for me to solve this dispute is to implement a CAPTCHA using some of the proposed techniques. If I fail, then I was mistaken. If I succeed, then maybe people will given G-WAN a try.
The CAPTCHA example is incomplete. It is missing the part that actually makes it difficult for robots to solve. The purpose of the example was to provide a list of ways to implement an effective CAPTCHA.
I am not the author of G-WAN. I am not claiming to have done anything.
This whole thread is based on a misunderstanding. G-WAN does not have any CAPTCHA support. The example merely demonstrates how to use G-WAN's image generation API.
The CAPTCHA example is not meant to be used without modification in an application. CAPTCHAs are not a feature of G-WAN.
No where did anyone claim to have invented stateful web applications.
I do not have an example handy, so hopefully an explanation is enough.
The CAPTCHA example has two parts: generating the CAPTCHA values and presenting them in a way that is easy for humans to solve, but difficult for computers.
This example only implements part one, but gives some methods of how one could implement part two. This is why it is so easy for you to write a script to solve it.
Using mouse cursor hovering, one could change the background color behind the image based on an event (such as a mouseover of an input field) that is probable for a human to make, but less so for a computer. This is difficult for a computer to solve, but not impossible to break.
G-WAN applications are persistent, meaning it is trivial to record stateless actions the client has made in the past for use in the future. The client would have to provide a known good value from the past in order to solve the CAPTCHA. This is what is meant by "previous state or shared secret".
> Using mouse cursor hovering, one could change the background color behind the image based on an event (such as a mouseover of an input field) that is probable for a human to make, but less so for a computer. This is difficult for a computer to solve, but not impossible to break.
How would this reveal the CAPTCHA value to a human but not to a computer? If the string is readable over only some background colors, then it's written on a transparent-backed image, so the whole background-changing script can be ignored. Just OCR the image with the transparent background.
If you're suggesting the CAPTCHA itself be the movement of the mouse over specific inputs, rather than deciphering a string, then this is trivial to break as well. The code that watches the mouse events and does whatever it does to indicate human-ness has to be written in JavaScript and transmitted to the browser... which means it's sitting right there to be analyzed and copied by the bot author. They don't need to replicate the mouse movement, just trigger the same code the correct movement triggers.
Requiring previous state adds nothing to the test either. If a human has to visit a certain sequence of pages before submitting a form, the bot can make the same sequence of HTTP requests and replay the same cookies or however you track the state.
It doesn't sound like you know what you're talking about.
> If the string is readable over only some background colors, then it's written on a transparent-backed image, so the whole background-changing script can be ignored. Just OCR the image with the transparent background.
The CAPTCHA image contains false values for misdirection, hence the requirement of running the background changing script.
Here is a list of claims made on the example CAPTCHA page:
BG Ability
color to
generation solve (as defined by the author)
---------- ---------
mouse cursor hovering difficult
previous state impossible
input
shared secret
I have no way to know precisely what the author means by those words since the given example does not demonstrate any of them. I cannot match the left and right columns.
For example, if I claimed that a CAPTCHA based off a "shared secret" only a trusted user has is impossible to solve, then one might ask the question, "Why is a CAPTCHA needed for a trusted user?". Is this what the author of the CAPTCHA example meant? I do not know.
My only purpose of posting in this thread was to show that the OP's premise that "this is the best example of how not to implement it." is flawed because no such example was given in the first place.
Nothing about that sounds unique from existing CAPTCHA systems, nor does it sound hard to break. You're going to use CSS or Javascript to "hide" it from a computer? That uhm, doesn't make much sense (to me).
Not to mention something like Sikkuli (in JPython) can actually make your mouse hover and record values...
Plus how is that easy for humans to solve?
1. They need to know they can put their mouse over the text
2. Then record the text
3. Sum the numbers
4. Write the result...
If I had to do this every time I wanted to login to a site I'd seriously consider switching. And this is the 'best' captcha impossible to break - for humans. Machines can solve it easily.
I didn't read through all his code...but wouldn't OP's approach only work if they used the same GIF over and over?
In any case, it seems trivially easy to break. Just capture the image. Read the background color value. Generate the image (with the background color) in ImageMagick and run through your OCR of choice. Obviously, that's not the fastest way to do it if you're trying to do thousands of attempts at once, but it's the least brainpower-involved.
No, he just hard-coded the number shapes. As long as the numbers used the same font as the example (and don't overlap), it should work just fine.
OCR would probably be more robust in general (for varying fonts and number shapes)... but it's simply absurd to call G-WAN's scheme a better captcha. More obscure and less targeted? Perhaps.
I thought about using OCR - just to over-engineer it. But I wanted to show how the characters are perfectly aligned and how clear the font is. I would like to understand, what he thought, why this Captcha is so special.
I haven't ran the code, but I did completely read it.
It appears to me that he maps the character set GWAN uses for the captcha system, and as such, should work for any image generated by GWAN using the same character set as it simply identifies pixels matching the character set.
I think this algorithm Captcha is optional.
I'm using G-WAN, and I implemented another CAPTCHA algorithm in C.
I like it when someone proves that something is unsafe.
It's inspiring to improve our applications.
All the people who make something fantastic in some part were crazZzyYy
But G-wan author i think need chillout and start re-thinking
G wan should be OK for eq CDN
For hosters
* need options to turn off script language
* Support modules like in Apache
* Support .httacess
For me
* Add fcgid but something better and more faster its posibble. Show that PHP on gwan can be balizing fast
* add modules session mongodb redis etc
* should be OK replacement for apache
* recode version for windows for developer
Go to some IT conference run 2 machine NginX full optimized for speed + Gigagbyte network and show people how its work CPU usage request per second to compare
<@merlin_> do we know this guy?
... snip ...
<@merlin_> """Today I had my first lightning talk at #BerlinSides_0x3"""
<@Kos> OH YEAH
<@Kos> that adude started following me a few weeks ago
<@C-Ps> berlin sids is pretty slick
... snip ...
<@Kos> I probably meat that dude at berlinsides
<@Kos> erm
<@Kos> met
<@savant42> meat, eh?
... snip ...
<@C-Ps> do you often meat men in berlin?
Thank you for providing the lulz, as well as the link to stiltwalker!
It was appreciation for your work and your link to ours, as well as an invitation. So I would go more on joy, unless of course you are afraid of hackers that like to drink and break things, not necessarily in that order, but often.
As it turns out, "anonymous password authentication" means that it gives you a randomly generated string which you have to copy and paste into another box each time you submit a comment. It sounds like something a machine could do because it is something a machine could do with ease. And yet nobody seems to have noticed this for two reasons:
1) The author deletes all the comments on his blog questioning why the system works
2) It has a 4.5 star rating on the WordPress plugin DB
As a result, the author is still making ridiculously exaggerated claims about the capability of his system, like "If Gawker had been using the anonymous password authentication built into Spam Free WordPress this incident [the Gawker break-in in 2010] would not have happened." Another gem is "CAPTCHA is not used because it is hard to read, unnecessary, easily cracked, and reduces the number of real comments substantially."
So there it is, another snake-oil salesman spreading FUD and making users (of some very popular websites[1]) suffer.
[0]: http://www.toddlahman.com/spam-free-wordpress/ [1]: http://www.raspberrypi.org/