By Steve C. Orr
Robots are taking control of the Internet! Don't let them overwhelm your Web site with their unrelenting, self-serving probes. Now you can fight back with this free control that allows you to discriminate between human and computer visitors.
While this might sound like a sci-fi promotion for the next Terminator or Transformers movie, in a way, that ominous sci-fi future is already here. But don't be too afraid — just like in the movies, there are robots here to help us, too.
Not All Robots Are Bad
Robots are automated software systems that perform functions normally expected to be done by people. They send e-mail, surf the Web, send instant messages, etc. Such bots can be used for good. For example, Google's multitude of bots surf virtually all public Web sites and collect bits of information that it uses to help people search and find those Web sites. Google's bots are generally considered to be respected and responsible members of the Internet, because they abide by requests for privacy and, in exchange for the small amount of shared Internet resources they consume, provide a useful service that's valuable to nearly everybody.
The problem is that bad people can make bots, too — and their bots may do bad things. At their worst, bots have been used to take control of unsuspecting victims' computers. When a hacker "owns" enough computers through such a process, their army of zombie PCs can march across the Internet doing evil deeds, such as swiping credit card numbers, breaking into more computer systems, and taking down major Web sites by overwhelming them with phony Web page requests.
Then there are the bots in the middle ground. Maybe their creators didn't intend for them to do bad things, but, nevertheless, many people consider them to be nuisances — or worse. In the early days of the World Wide Web, the idea of bots caught fire, and this borderline kind of bot ran rampant and unhindered. This caused problems for many pioneers of the Internet.
Companies such as eBay and Ticketmaster were dominant on the Web in their respective industries; their tiny competitors were struggling, in comparison. They couldn't compete with Ticketmaster if they didn't have tickets to sell, and they couldn't compete with eBay without plenty of auctions to attract members. So they made bots that would simply use these big-name sites in the background. If you bought a ticket from their budding Web site, their bots would actually go to Ticketmaster's site and buy the ticket and then pass it on to you. This might not have worked out so badly if it weren't for the fact that these bots had to constantly probe the sites they were using to keep their lists of data fresh and in sync. In time, and as such tactics caught on, the number of bots increased, and their constant automated requests started to take a heavy toll on the servers they were accessing. Ticketmaster and eBay were getting irked that their competitors were making money by consuming their expensive data, resources, and computer systems without permission.
Friendly tactics were tried at first, such as asking these little companies to go away and buy their own computers. That didn't work — mostly because these little companies couldn't afford to compete with the big boys. Then legal actions were tried. But even today there are precious few laws regarding such matters, and back then there were virtually no laws covering such newfangled concepts. So what was the solution to be?
Yahoo had a bot problem, too. Its free e-mail system was being abused by spam bots that were automatically signing up for thousands of e-mail accounts and then exploiting them to send junk e-mail to people all over the world. Yahoo enlisted the help of a Carnegie Mellon University team that came up with a brilliant technological solution.
CAPTCHA to the Rescue
CAPTCHA effectively immunizes a Web site against bots. It stands for "Completely Automated Public Turing Test To Tell Computers and Humans Apart". Its foundation lays in the fact that while computers are brilliantly useful at some kinds of things (like calculating equations and tracking data), there are still many tasks that the human brain is much better equipped to handle.
For example, a human can glance at a painting such as the Mona Lisa and see in an instant — without even thinking about it — that it's a picture of a beautiful woman sitting down with a demure smile. On the other hand, even today's most cutting-edge optical recognition computer systems would be hard-pressed to even be able to tell you definitively that there is a human being in such a picture.
Optical character recognition systems (which "read" text from an image) fair a little better — primarily because there are a finite number of characters in the alphabet. However, even these programs work only semi-reliably when under optimal conditions: clear black print on a plain white background. Colors, blurriness, fancy fonts, handwriting, symbols, embedded pictures, and crooked text are just a few of the common conditions that tend to confuse optical character recognition systems, making them unable to recognize the writing contained in a scanned image.
So these days, Web sites that wish to defend themselves against bots use the CAPTCHA concept to display a picture that contains crooked, colorful, blurry text with varying fonts and asks the user to type in what they see (see Figure 1). It is a simple task for any legitimate human user, but a virtually insurmountable chore for bots. Therefore, a Web site can be reasonably sure that any user who makes it through their CAPTCHA gateway is a person, not a computer.
Figure 1: Ticketmaster implements an advanced CAPTCHA system that lets users in and keeps bots out.
Using CAPTCHASP
CAPTCHASP is a custom Web control I created to easily add CAPTCHA verification to any ASP.NET Web site (see Figure 2). Simply drag the CAPTCHASP.DLL onto your Visual Studio toolbox, then drag it from there onto any Web form and you've got instant CAPTCHA (see end of article for download details). Unlike most controls that generate images, there is no dependency on outside pages, resources, HTTP Handlers, or web.config settings. This is because of a novel development technique I used that will be detailed in next month's follow-up article that delves into CAPTCHASP's source code. Simple, standard xcopy deployment is all that's needed — and it's virtually impossible to mess up.
Figure 2: The CAPTCHASP control can be dragged onto any ASP.NET Web form to provide instant, highly customizable CAPTCHA verification.
After the control's been dropped on your Web form, the only other thing that's vitally important for you to know is that the control will raise its UserVerified event when the user has entered the correct codeword and therefore been proven to be a real human. From that event you can then choose to let the user in and perhaps set some kind of flag to remember that the user has successfully been verified.
This is all you really need to know to use CAPTCHASP, but I suggest you keep reading to learn how to take advantage of the many optional features the control offers (see Figure 3 for the complete list of CAPTCHASP events).
CAPTCHASP Events | Parameter | Description |
UserVerified | n/a | This event is raised when the user has entered the correct code and therefore been proven to be human. Alternatively, you could ignore this event and instead call the Validate method followed by a check of the IsValid property. |
VerificationFailure | FailCount (Integer) ByVal (in) | This event is raised when the user has entered an incorrect code. The FailCount parameter specifies how many consecutive times they've failed to enter the correct code. An exception will be thrown after 15 invalid attempts, so you may want to handle this exception or deal with the suspected bot in some other way before that happens. |
CodeWordSelection | CodeWord (String) ByRef (in/out) | This event is raised when it's time to choose a codeword for display in the image portion of the control. The control's suggested codeword will be provided by the modifiable CodeWord parameter, unless the CodeWordType property is set to Custom, in which case you'll be required to provide your own codeword via the CodeWord parameter. |
Choosing CodeWords
The "CodeWord" is the CAPTCHA characters the user sees in the image and types in to be validated. The CodeWord may be a series of random characters, an actual word, or some combination thereof, depending on how CAPTCHASP has been configured.
By default, CAPTCHASP's CodeWordType property is set to its RandomCharacters enumeration value. This will cause the control to generate a random series of lowercase characters. The AddSymbols property, when set to its default value of True, will mix in some symbol characters, as well. The number of characters generated for each CodeWord is determined by the NumberOfCharacters property, which has a default value of 5, a minimum value of 3, and a maximum value of 10.
If the CodeWordType property is set to the UseWordList enumeration value, CAPTCHASP will randomly choose a CodeWord from the comma-separated list of words in the WordList property. The WordList property comes pre-populated with a list of more than 150 English words that are well thought out to be clear to humans, but confusingly similar to bots. Of course, you can add to this list, modify it, replace it with your own list, or customize it in any way you wish.
In all cases, the control's randomly chosen CodeWord is sent as a parameter through the CodeWordSelection event. This gives the page code a chance to observe the value or request a different CodeWord be randomly generated by calling the GenerateNewCodeWord method. The CodeWord parameter is also modifiable so you can optionally replace the CodeWord with one of your own choosing.
However, if the CodeWordType property is set to the Custom enumeration value, the CodeWordSelection event's CodeWord parameter becomes required; the CAPTCHASP control will not take the time to randomly select a CodeWord — instead, the page will be expected to supply it with one. This option is nice for situations where you want to always use your own function to generate a custom CodeWord or dynamically grab one from a data source of your choosing (see Figure 4).
CodeWord-related Properties | Property Type | Description |
CodeWordType | Enumeration | Specifies whether the CAPTCHASP control should automatically generate a random series of letters for the CodeWord, whether it should randomly choose a word from the WordList property, or whether you prefer to supply it with a custom CodeWord. |
AddSymbols | Boolean | When set to its default property of True and the CodeWordType property is set to its default of RandomLetters, symbol characters will be randomly mixed in with lowercase letters to create the CodeWord. |
NumberOfCharacters | Byte | When the CodeWordType property is set to its default of RandomLetters, this property specifies how many randomly generated characters each CodeWord should contain. |
WordList | String | A comma-separated list of words from which the control will randomly pick a CodeWord when the CodeWordType property is set to UseWordList. There are more than 150 pre-populated default words. |
For security reasons, CodeWords of at least three characters are required. For clarity and usability reasons, you should avoid supplying CodeWords of more than 10 characters. It's also good to be aware that the letter "l" and number "1" look confusingly similar; therefore, you may want to avoid using one or both of them. Likewise, the number zero ("0") is often confused with the letter "o". The CAPTCHASP control takes into consideration these issues when generating its random CodeWords. One way it does this is by not using any numbers. Additionally, user input is case-insensitive, so users need not worry about accidentally entering an uppercase "O" where a similar looking lowercase "o" was expected. And the default WordList contains no numbers nor the letter "l".
Cosmetic Customizations
The look and feel of CAPTCHASP can be configured in a variety of ways. Virtually every aspect of the control's appearance can be altered via properties and styles. Figure 5 demonstrates many of the control's optional user interface elements and customizations. I don't necessarily recommend altering the control's appearance to this much of an extreme, but it's nice to know you can.
Figure 5: Virtually every aspect of CAPTCHASP's appearance (including several optional elements) can be altered via properties and styles — even to ugly extremes such as this!
The optional title area (at the top of Figure 5 in green) can be shown by setting the TitleText property to the text you'd like to appear there. You can adjust the TitleStyle property elements to change how it looks in a variety of standard ways.
The InstructionText property can be used to change the text that is displayed above the textbox. Its InstructionStyle property elements can be used to adjust its look in many ways, and has been used in Figure 5 to apply a fancy italic font.
A hyperlink can be displayed to explain in more detail why the user must go through this process. This "Why?" element (shown in blue in Figure 5) pops up a customizable message when clicked. The WhyStyle property elements can be used to adjust the look and feel of this hyperlink. The CodeWord entry textbox can also be adjusted in a variety of ways via the TextEntryStyle property elements. Figure 5 demonstrates this with purple text.
The Submit button (shown in orange in Figure 5) has ButtonStyle property elements associated with it to adjust its appearance. The ButtonText property can be used to change the button text from its default of Submit. The ShowSubmitButton Boolean property can be changed to False to make the button invisible, in case you'd like to implement your own submit button (or link) elsewhere on the page. Such a custom submit element would need to call the CAPTCHASP control's validate method to trigger the control to check if the user's entry is correct or not.
If the user enters the wrong code, the FailMessage will appear, as shown in Figure 5 in red. The FailMessageStyle property elements can be used to adjust visual aspects, and the FailMessageText property can be used to change what it says.
Finally, there is an optional ChangeCodeWord hyperlink that can be shown at the bottom of the control. Figure 5 displays this link highlighted in yellow via the ChangeCodeWordStyle property elements. The ChangeCodeWordText property can be used to change what the link says. The ShowChangeCodeWordLink property can be changed to True (from its default of False) to get this link to appear. When the user clicks this link the control will generate and display a new CodeWord. The FailCount property will be incremented each time this happens to help prevent abuse by any cherry-picking bots that feel brave enough to attempt to decode a CAPTCHA image. Instead of displaying this built-in link, you could implement your own link elsewhere on the page to change the CodeWord. It would simply need to call CAPTCHASP's server-side GenerateNewCodeWord method.
Conclusion
You should now understand what CAPTCHA is, as well as how and why it came to be. With the CAPTCHASP control you can now easily immunize your ASP.NET Web site to keep bots out and let legitimate users in.
The CAPTCHASP control is freely downloadable to everyone. You can download it or try the live demo right now.