CAPTCHA’d Behind Enemy Lines

By Erhan K.


The CAPTCHA (a convoluted acronym for “Completely Automated Turing Test To Tell Computers and Humans Apart”) has become quite popular in the earlier part of the decade, coinciding with the rise of email spam and script dictionary attacks. The CAPTCHA’s original purpose was to automate the process of distinguishing between a human doing something (registering an email address or answering a poll) and a program doing the same thing by being a puzzle that is easy for most humans to solve but hard for most programs to solve. CAPTCHAs are generally image-based character recognition questions, but there are a number of CAPTCHAs that are the audio equivalent, or that ask you to pick images that relate to one another. A lot of research has been done that has used CAPTCHAs (including some by this author) or in defeating them.

CAPTCHAs have a number of weaknesses:

  • From a security perspective, they rely on CAPTCHAs being “hard” to solve. While CAPTCHAs can be as hard as desired, the constraint that they must be easy to solve for humans prevents a program from getting carried away with the creation of a CAPTCHA (though sometimes they err on the side of caution). This is in contrast to, say, secure hashing techniques, where the harder to “dehash”, the better. If a human had to reverse the hash of a string, secure hashing would be a contradiction in terms.
  • CAPTCHAs require a large number of assumptions. For visual character recognition, they assume the human solving the puzzle can see fairly clearly, recognizes the character set, is not color-blind, and so on. Audio CAPTCHAs assume the human can hear clearly. Pattern-matching may rely on the human sharing the same culture as the designer (“cow” may relate to “chicken” in a group of objects because they both belong on a farm or to a “horse” because they are roughly similar (four-legged mammals)) as well as recognizing what all images represent.
  • Most of all, CAPTCHAs are annoying. Really annoying. A person may not mind filling out forty textboxes and giving all sorts of personal information to a website he or she would never give to another person, but when he or she scrolls to the bottom and are asked to enter the six weird-looking characters in a group of twelve that have a picture of a cat in front of them into a textbox to prove they are a human, what has the world come to? Only a human would waste the time, one supposes, but goodness, watch out if the computer says it was wrong and by implication, he or she is possibly not a human. There is something about the combination of an arbitrary pop quiz and the vague insult of a program calling a human a program that has led to the collective sighs and rolling of eyes of would-be CAPTCHA-solvers across the world.

The biggest problem of the CAPTCHA is its success. They are popular because they worked for quite a while, and now they are the new username/password. Every website has it and requires it. It is the new black. But at least with a username and password, the lazy (but insecure) human can use the same one over and over, or head to a username/password sharing site and save the registration time. Not so with CAPTCHAs, because they are guaranteed to be different from site to site and get progressively harder as image recognition becomes more and more powerful. If the password had followed the same development cycle as the CAPTCHA, a password would today would require 512 characters, need at least one character from sixty punctuation and language sets in the UTF-16 standard, and fail 25% percent of the time when you would swear it was entered correctly.

Ironically, CAPTCHAs are illustrative of the difference between perceived and substantive security. CAPTCHAs are perceived as very secure since they are so hard for humans to solve without due diligence. Yet they are that hard because artificial intelligence techniques have rendered previous CAPTCHAs obsolete (if the site upgraded their CAPTCHA systems to counter the threat). CAPTCHAs are solved all the time, one way or another. And that’s only the published materials! But is there “A Better Way”? Can we build a better mousetrap? Can we add a third requirement to the CAPTCHA that in addition to being easy for humans and hard for programs, they are not annoying or obtrusive? The answer is “yes”. The construction of such a “CAPTCHA 2.0” is left as an exercise for the reader.

—Erhan K., MCPD

(Images are monochrome CAPTCHAs that read “Denim Group“. The first uses warping, blurring and linear obfuscation. The second utilizes warping with low character segmentation)

About Dan Cornell

A globally recognized application security expert, Dan Cornell holds over 15 years of experience architecting, developing and securing web-based software systems. As the Chief Technology Officer and a Principal at Denim Group, Ltd., he leads the technology team to help Fortune 500 companies and government organizations integrate security throughout the development process. He is also the original creator of ThreadFix, Denim Group's industry leading application vulnerability management platform.
More Posts by Dan Cornell

One Response to “CAPTCHA’d Behind Enemy Lines”

  1. Bill Shirley

    “Most of all, CAPTCHAs are annoying. Really annoying.”

    My WordPress blog will hold someone’s comment the first time they comment, but once I’ve approved it, will auto-allow all further comments. How hard is that?

    Blogger? CAPTCHA (that I sometimes fail – i suspect because I spent 5 plus minutes writing, or distracted from writing). It sometimes keeps me from commenting.

    I have made it a habit to now include the CAPTCHA text also at the bottom of my comment. I must share the annoyance.

Leave a Reply

Your email address will not be published. Required fields are marked *