CAPTCHA is an acronym for Completely Automated Public Turing Test To Tell Computers and Humans Apart. CAPTCHA may be described as a cyber security tool often used on websites. Websites that offer a service or collect data require the user to give inputs. This process can be thwarted with the use of bots. Bots can present themselves as humans with fake identities.
CAPTCHAs are designed in many ways but all of them have the same goal: to let humans into the system and deny admission to bots. They attempt to do this by presenting problems that are hard for bots to solve but relatively easy for humans to solve. With the coming of Artificial Intelligence (AI), computers have become smarter at solving CAPTCHAs. Therefore CAPTCHAs themselves have evolved to counteract intelligent bots.
Where do we need CAPTCHAs?
Automated programs or bots are fast compared to humans. This gives bots an unfair advantage. For example, bots can book tickets in bulk and then sell them later at a higher price in the black market. Likewise, bots can generate bulk votes and thereby skew the results of an online voting system. Bots can also be used to spam, troll or trigger DDoS attacks. Here are some use cases where CAPTCHAs are useful:
- Booking of tickets.
- Online voting system.
- Creating a new online/email account.
- Preventing dictionary attack in password systems.
- Promoting products as a comment to a blog post.
- Liking or sharing a web page via social networking sites.
- Protecting site contents from scrapers or search engine bots.
- Stealing personal information at chat rooms.
Why is a CAPTCHA sometimes called a reverse Turing test?
Turing test was conceived in 1950 by Alan M. Turing as a way to check if machines were as intelligent as humans. If a human interrogator interacting electronically with a machine and another human cannot tell which of them is the machine, then the machine passes the Turing test.
With CAPTCHA, the interrogator is not a human but a computer. Hence the term reverse Turing test is sometimes used.
What are the different types of CAPTCHAs?
The following is a broad classification:
- Text-based: Users have to type the characters from a distorted text.
- Image-based: Users are asked to select images belonging to a certain group, such as those containing cats.
- Video-based: Users are shown short video clips and asked to answer a question based on it. NuCAPTCHA is an example.
- Audio-based: Users listen to an audio clip and type what they hear.
- Action-based: Users are asked to perform some action. reCAPTCHA V2 asks users to click a checkbox. MotionCAPTCHA asks users to trace a shape on the canvas. Dynamic Cognitive Game (DCG) is another example.
- Question-based: Users are presented a question that they have to answer correctly. Examples include "What is 1 + six?" or "Flower, resting, lawyer, campsite: the word starting with 'c' is?"
- Fun-based: Users are asked to solve a puzzle or play a game. Bongo is an example. PlayThru from Are You A Human is another example.
- Invisible: User verification is done in the background without requiring specific user inputs. Google introduced this with its Invisible reCAPTCHA.
- Hybrid: Combinations of the above.
How are CAPTCHAs related to AI?
Computers are getting faster and having access to more memory than before. The algorithms that power them are getting better. So tasks that were once difficult for computers are becoming easier. This means CAPTCHAs that were previously unsolvable are becoming solvable by bots.
Advances in optical character recognition (OCR) have enabled bots to solve text-based CAPTCHAs. Image-based CAPTCHAs can be cracked due to advances in image processing, pattern recognition and object recognition. Question-based CAPTCHAs rely on advances in natural language processing (NLP). Audio-based CAPTCHAs too can be solved due to advances in speech-to-text processing. With reCAPTCHA, humans assist machines in solving difficult problems. All these contribute to machines getting smarter everyday.
Ultimately, we have to come up with better CAPTCHAs to defeat smarter bots. A CAPTCHA that's secure today may not be so tomorrow. reCAPTCHA's creator Luis von Ahn commented in 2012 that CAPTCHAs could become useless in another ten years.
What techniques are typically used to solve CAPTCHAs?
Let's note that text-based are presented as images. To solve text-based CAPTCHAs, segmenting the text into individual characters is the first step. One way to enhance OCR is to remove noise. Image transformation such as rotating, shifting, mirroring, warping can lead to better OCR. With audio, waveform analysis can help in solving the CAPTCHA. Machine learning that solves segmentation and the recognition problems simultaneously have been shown to give better results.
Some sites don't implement CAPTCHA in a secure manner. They can be vulnerable due to reuse of session ID of a known CAPTCHA image.
Web services have come up to solve CAPTCHAs: Death by CAPTCHA, 2Captcha. Where the economics make sense, some of these use humans to solve the CAPTCHAs. Some web services forward the CAPTCHA to pornographic sites. Visitors need to solve the CAPTCHA before they can view pornographic content. It's been argued that this is not really economical.
Are there guidelines to making good CAPTCHAs?
Some guidelines are worth noting:
- Accessibility: Give users options. Audio CAPTCHAs are suited for the visually impaired. Allow users to request another CAPTCHA if a particularly hard one comes up. With touchscreen interfaces, text-based CAPTCHAs are less suited compared to CAPTCHAs that require clicks or drag-and-drop actions.
- Dynamic generation: Avoid serving CAPTCHAs from a fixed database. CAPTCHAs should be generated dynamically. Avoid using trivial distortions. Avoid reuse of CAPTCHAs.
- Security: Avoid sending the solutions to the client along with the challenge.
What are reCAPTCHAs?
reCAPTCHA was invented by researchers at Carnegie Mellon University in 2007. It used distorted characters of text. Rather than using just one word, a pair of words was presented to the user. One of these words is known to the CAPTCHA system but the other one is unknown. Such unknown words, rather than being randomly generated, were picked up from old books or articles that were being digitized. Since state-of-the-art OCR systems had difficulty deciphering these words, why not crowdsource this task to humans via CAPTCHA? It was with this idea that reCAPTCHA was born.
Thus reCAPTCHA in its original form helped in digitizing scanned text. In 2008, it was claimed that the system had transcribed 440 million words via 40,000 sites.
reCAPTCHA was acquired by Google in 2009. It evolved into No CAPTCHA reCAPTCHA (aka reCAPTCHA V2) in 2014. In 2017, Google introduced Invisible CAPTCHA. With these newer versions, the CAPTCHA system tracks user behaviour to determine if it's coming from a bot. This includes mouse movement, scrolling of the page, time taken to submit a form, and many more.
How effective are CAPTCHAs in stopping bots?
One of the early CAPTCHAs, Gimpy, was used by Yahoo. In October 2002, a program was able to solve Gimpy CAPTCHAs. Back in 2005, W3C stated that many CAPTCHA systems could be solved with 88-100% accuracy. At PARC, one researcher was able to crack Assira CAPTCHA with 7.5% probability. Blogger Jeff Atwood reports that CAPTCHAs of Yahoo, Hotmail and Google were broken in early 2008.
A Stanford team showed in 2010 that their Decaptcha tool could bypass many text-based CAPTCHAs. At a conference in 2012, a program was able to solve Google's audio CAPTCHA 99.1% of the time. In 2013, Vicarious AI claimed to be able to solve CAPTCHAs 90% of the time. Google itself reported that distorted text can be solved by AI with 99.8% accuracy. Using machine learning, researchers in 2014 were able to crack reCAPTCHA with 33% success rate and Baidu at 39%.
This does not mean that CAPTCHAs are useless. It just means that current CAPTCHAs have to be better than what AI is capable of solving.
Are there alternatives to using CAPTCHAs?
WCAG Working Group of W3C has a list of alternatives to using CAPTCHAs. Another list is by Karl Groves. It's important to note that alternatives may not work all the time since smart bots may find a way, now or in the future, to bypass them.
Honeypots and timestamp analysis are alternatives that have proven effective. Another option is to use an anti-spam service such as Akismet, Mollom and SBlam. It's possible to enforce user verification via emails or text messages sent to their mobiles. While this may defeat bots, it reduces usability for humans. Game-based CAPTCHAs are still CAPTCHAs but they are less annoying and more fun to users. It has been claimed that PlayThru takes on average 10-12 seconds compared to 12 seconds of text-based CAPTCHA.
What are the accessibility issues surrounding CAPTCHAs?
In response to smarter bots, CAPTCHAs have gotten more sophisticated. Unfortunately, this makes it harder for humans as well. A study from 2009 found the while CAPTCHAs reduced spam, they also reduced conversion rates. Another study with Animoto web app showed that conversion rates were better by 33% without using CAPTCHAs. A survey from 2010 showed that audio CAPTCHAs for non-native speakers of English are hard. Even with text-based CAPTCHAs, some text can be hard to solve. In 2000, success rate was 97% but this dropped to 92% in 2012.
It has been claimed that CAPTCHAs ignore issues that senior citizens and visually impaired face. A study from 2009 with visually impaired showed that with audio CAPTCHAs success rates was only 45% and it took users 65 seconds to solve one.
Harry Brignull has even questioned the approach, "Using a CAPTCHA is a way of announcing to the world that you’ve got a spam problem, that you don’t know how to deal with it, and that you’ve decided to offload the frustration of the problem onto your user-base."
Can you name some providers of CAPTCHAs?
Google's reCAPTCHA is the well known. As of March 2017, more than million websites are using it. Also, 11.2% of the top 10k sites are using it. NoMoreCaptchas is a possible alternative to Invisible reCAPTCHA since it does not require explicit input from users.
PICATCHA is an image-based CAPTCHA that also uses it as an advertising medium. Microsoft had a research project named Asirra that used image-based CAPTCHA. Confident CAPTCHA claims 96% success rate and usage of 50 million verifications per month. Other examples are Ironclad CAPTCHA, PlayThru, NuCAPTCHA and Solve Media. Site captchas.net is free service. BotDetect CAPTCHA uses image and sound. JCAPTCHA, implemented in Java, can be downloaded and integrated into websites. Dice Captcha shows pictures of dice.
In my automated tests, how can I test form submissions that have CAPTCHAs?
One approach is to disable CAPTCHAs during testing. Depending on the system, this could be in code or a flag in the database. Attempting to automatically solve CAPTCHAs is a fundamentally flawed approach. If test automation can solve, it really suggests that the CAPTCHA is not strong enough to prevent bots from passing themselves off as humans.
Google launches reCAPTCHA v3. This doesn't require users to solve any CAPTCHA and gives greater control to website owners. reCAPTCHA v3 returns a score (1.0 for good interaction, 0.0 for a likely bot). Based on this score, website owners can take appropriate action. In March 2019, it's reported that reCAPTCHA v3 has been partially tricked by an AI program using Reinforcement Learning.
- Atwood, Jeff. 2008. "CAPTCHA is Dead, Long Live CAPTCHA!" Coding Horror. March 4. Accessed 2017-03-20.
- Brignull, Harry. 2011. "F**K CAPTCHA." 90 Percent of Everything. March 25. Accessed 2017-03-21.
- BuiltWith. 2017. "reCAPTCHA Usage Statistics." March. Accessed 2017-03-21.
- Burling, Stacey. 2012. "CAPTCHA: The story behind those squiggly computer letters." Phys.org. June 15. Accessed 2017-03-20.
- Bursztein, Elie, Steven Bethard, Celine Fabry, John C. Mitchell, and Dan Jurafsky. 2010. "How Good are Humans at Solving CAPTCHAs? A Large Scale Evaluation." Proceedings of the 2010 IEEE Symposium on Security and Privacy (SP '10). Accessed 2017-03-20.
- Bursztein, Elie, Jonathan Aigrain, Angelika Moscicki, and John C. Mitchell. 2014. "The End is Nigh: Generic Solving of Text-based CAPTCHAs." Usenix Workshop on Offensive Technology. Accessed 2017-03-20.
- Captcha. 2017. Accessed 2017-03-20.
- Chow, R., P. Golle, M. Jakobsson, X. Wang, and L. Wang. 2008. "Making CAPTCHAs clickable." Ninth Workshop on Mobile Computing Systems and Applications (HotMobile 2008). ACM, pp. 91-94. Accessed 2017-03-20.
- Confident Technologies. 2017. "Confident CAPTCHA." Accessed 2017-03-21.
- CyLab. 2017. "The reCAPTCHA Project." Accessed 2017-03-20.
- Elson, Jeremy, John (JD) Douceur, Jon Howell, and Jared Saul. 2007. "Asirra: A CAPTCHA that Exploits Interest-Aligned Manual Image Categorization." Microsoft Research. October 1. Accessed 2017-03-21.
- Engber, Daniel. 2014. "Who Made That Captcha?" NY Times. January 17. Accessed 2017-03-20.
- Garrity, Michael. 2013. "7 Innovative Solutions to CAPTCHA User Attention." Website Magazine. June 2. Accessed 2017-03-21.
- Golle, P. 2008. "Machine learning attacks against the ASIRRA CAPTCHA." 15th Annual ACM Conference on Computer and Communications Security (CCS 2008). ACM, pp. 535-542. Accessed 2017-03-20.
- Google Developers. 2018. "reCAPTCHA v3." reCAPTCHA Docs, October 29. Accessed 2019-03-27.
- Google Developers. 2019. "Choosing the type of reCAPTCHA." reCAPTCHA Docs, February 11. Accessed 2019-03-27.
- Griffin, Andrew. 2017. "Google kills off the Captcha, ensuring humans don't need to see the most annoying thing on the internet." The Independent. March 13. Accessed 2017-03-20.
- Groves, Karl. 2012. "CAPTCHA-less Security." April 3. Accessed 2017-03-20.
- Henry, Casey. 2009. "CAPTCHAs' Effect on Conversion Rates." Moz. July 17. Accessed 2017-03-21.
- Hill, David J. 2012. "Artificial Intelligence Will Defeat CAPTCHA — How Will We Prove W're Human Then?" SingularityHub. August 28. Accessed 2017-03-21.
- Horowitz, Kate. 2016. "The Surprisingly Devious History of CAPTCHA." Mental Floss. June 21. Accessed 2017-03-20.
- Kobie, Nicole. 2019. "Google's reCAPTCHA test has been tricked by artificial intelligence." Wired, March 25. Accessed 2019-03-27.
- Lillibridge, Mark D., Martin Abadi, Krishna Bharat, and Andrei Z. Broder. 1998. "Method for selectively restricting access to computer systems." US Patent US 6195698 B1. Filed April 13. Accessed 2017-03-20.
- Liu, Wei. 2018. "Introducing reCAPTCHA v3: the new way to stop bots." Webmaster Central Blog, Google, October 29. Accessed 2019-03-27.
- May, Matt. 2005. "Inaccessibility of CAPTCHA." W3C Working Group Note 23. November. Accessed 2017-03-20.
- MotionCAPTCHA. 2017. Accessed 2017-03-21.
- Munsell, Andrew. 2012. "Captchas Are Becoming Ridiculous." Blog. July 28. Accessed 2017-03-21.
- Pachal, Pete. 2013. "Captcha FAIL: Researchers Crack the Web's Most Popular Turing Test." Mashable. October 28. Accessed 2017-03-21.
- Popov, Leonid. 2016. "Quick history of CAPTCHA." LinkedIn. May 30. Accessed 2017-03-20.
- Robinson, Sara. 2002. "Human or Computer? Take This Test." NY Times. December 10. Accessed 2017-03-21.
- ScienceDaily. 2014. "Better than CAPTCHA: Improved method to let computers know you are human." University of Alabama at Birmingham. August 25. Accessed 2017-03-20.
- Shet, Vinay. 2014. "Are you a robot? Introducing 'No CAPTCHA reCAPTCHA'". Google Security Blog. December 3. Accessed 2017-03-20.
- Takahashi, Dean. 2012. "Are You a Human replaces annoying CAPTCHAs with games." Venture Beat. May 21. Accessed 2017-03-21.
- TechnoBlog. 2015. "Google no Captcha + INVISIBLE reCaptcha – First Experience Results Review." TechnoBlog, January 23. Updated 2019-03-10. Accessed 2019-03-27.
- TextCaptcha. 2017. TextCaptcha v41. Accessed 2017-03-20.
- The CAPTCHA Project. 2017. Bongo. Accessed 2017-03-20.
- Van Reijmersdal, Niels. 2016. "How to fill CAPTCHA using Test automation?" StackExchange. February 15. Accessed 2018-01-18.
- Von Ahn, Luis. 2008a. "We have a blog!" reCAPTCHA Blog. December 7. Accessed 2017-03-21.
- Von Ahn, Luis. 2008b. "New Audio reCAPTCHA." reCAPTCHA Blog. December 7. Accessed 2017-03-21.
- Von Ahn, Luis, and Will Cathcart. 2009. "Teaching computers to read: Google acquires reCAPTCHA." Google Blog. September 16. Accessed 2017-03-21.
- Von Ahn, Luis, Benjamin Maurer, Colin McMillen, David Abraham, and Manuel Blum. 2008. "reCAPTCHA: Human-Based Character Recognition via Web Security Measures." Science. Vol. 321. Issue 5895. September 12. Accessed 2017-03-21.
- WCAG Working Group. 2015. "Captcha Alternatives and thoughts." December 15. Accessed 2017-03-21.
- Wagner, Janet. 2017. "Google reCAPTCHA v1 API Shutting Down in March 2018." ProgrammableWeb, October 24. Accessed 2019-09-24.
- Wikipedia. 2017. "Turing test." March 20. Accessed 2017-03-20.
- World Heritage Encyclopedia. 2017. "Reverse Turing test." Accessed 2017-03-20.
- Yale, Brad. 2014. "The CAPTCHA: A History, A Problem, Possible Solutions." Pearson InformIT. September 10. Accessed 2017-03-20.
- Yeend, Howard. 2005. "Breaking CAPTCHA without OCR." Pure Mango Blog. November 30. Accessed 2017-03-20.
- Thompson, Clive. 2007. "For Certain Tasks, the Cortex Still Beats the CPU." Wired. June 25. Accessed 2017-03-21.
- Bushell, David. 2011. "In Search Of The Perfect CAPTCHA." Smashing Magazone. March 4. Accessed 2017-03-21.
- Concannon, Joe. 2015. "9 CAPTCHA Alternatives That Won’t Wreck Your UX." Digital Telepathy. August. Accessed 2017-03-20.
- Saric, Matej. 2013. "Strict Standards: Introduction to CAPTCHA Accessibility." BotDetect CAPTCHA. October. Accessed 2017-03-21.
- Von Ahn, Louis, Manuel Blum, Nicholas J. Hopper and John Langford. 2003. "CAPTCHA: Using Hard AI Problems for Security." May. Accessed 2017-03-21.
- Rao, Leena. 2009. "Google Acquires reCaptcha To Power Scanning For Google Books And Google News." TechCrunch. September 16. Accessed 2017-03-21.
- Reverse Turing Test
- Turing Test
- Optical Character Recognition
- Artificial Intelligence
- Computer Vision