Ever since an old site of mine (which has since been closed) was hammered with comment spam a few years ago, I've used a simple CAPTCHA script which has stopped comment spam in its tracks. The idea behind CAPTCHA scripts is that humans can read text embedded within an image a lot easier than a computer can, which makes it very hard to automatically pass a CAPTCHA test.

You can read the source behind my personal CAPTCHA system here: captcha.php source For those of you that are computationally-gifted, you'll see that my script is very simple; a gradient is painted on a graphic and then three random letters are overlayed in white. Live example CAPTCHA image: CAPTCHA

I recently was thinking about how easy this would be to crack, due to four main faults:

  1. The text is in white, but the gradient never actually reaches white. To the human eye, it might as well be white, but to a computer's "eye", colors can be filtered with extreme precision.
  2. There are only ever three letters.
  3. The letters are all spaced every 10 pixels; though they "bounce" up and down based on a randomly-generated y-variable.
  4. The letters are in a constant font face and size.

Knowing these flaws and only the basic concepts of computer vision, I set about to crack my CAPTCHA. For this task, I used PHP so that it could be run in a browser (and image processing is very easy). In less than 6 hours, I had a system which cracked an image better than 75% of the time. Inside of two more hours, I had an algorithm which cracked my CAPTCHA every time. Source code of CAPTCHA-breaking script: break.phps Live example of CAPTCHA-breaking script: /tools/break.php

Some key points to note about this script, should you decide to read through it, are that it's very specific towards my exact script. Any change, from font face to font size to color of the text to background would almost certainly cause this script to stop working. There is no optical character recognition (OCR) work done; for an image this simple and consistent, a simple hashing algorithm was more than sufficient. However, it would be relatively trivial to implement line-thinning and shape-recognition algorithms to increase accuracy if these changes were made. In spite of these setbacks, I feel this script proves a very valid point: my CAPTCHA is fundamentally flawed.

This exercise demonstrated the importance of having a reliable, sophisticated CAPTCHA system. This week I plan on changing over to a reCAPTCHA system so that these problems can be avoided.