By Scott Hamilton
Have you ever been asked by a website to prove you are not a robot? This occurs sometimes when you are filling out an online form, especially for a survey, registering a product or create a free email account. This is in place to prevent hackers using computer code to generate multiple accounts and then using those accounts to send mass emails. The technology behind this is called CAPTCHA, an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart.
This is a technology that initially started off just needing the user to click a button on the screen. It was easy to tell a real mouse click by the randomness of the click within the space of the button. Computers were bad at picking a random place to click within the button image. In recent years, randomness in computers has vastly improved, making it harder to determine the difference between a human click and a computer simulated click. As artificial intelligence improves, it is becoming more and more difficult to distinguish between a computer and a human.
The next step was to add the identification of text in a blurred, smeared or scratched image. This is fairly easy for a person to do; we are great at picking out meaningful information from damaged documents. Up until the early 2000s, computers were struggling to recognize and read characters from even very clean documents because of the variations in font size, letter shapes, and other slight differences in character and word formation. This made reading “damaged” text the perfect CAPTCHA.
In the early 2010s, through the efforts of Google and Carnegie Mellon, researchers as part of a digital library project began to scan and read millions of pages of text from books that were out of print and damaged. As they improved the character recognition algorithms to digitize libraries, a side effect was the creation of algorithms that could easily fool the text CAPTCHA into believing the software was a human.
In the backend of every CAPTCHA, puzzle computer AI algorithms were running to compile data from the human input to improve how the computer recognized the images. These multiple CAPTCHA images were great training datasets for AI and vastly improved a computer’s ability to solve the CAPTCHA problems. Researchers knew that as the CAPTCHA tests got more and more difficult, humans would stop wanting to do them at all and computers would eventually surpass humans in the ability to solve the problems, because CAPTCHAs are analytical in nature and computers are better analysts overall.
In 2014 after using text image CAPTCHAs to train an AI, Google created a competition between human and computer using highly distorted text images. The computer arrived at the correct result 99.8 percent of the time and the human participants were only correct 33 percent of the time. This led to needing a new CAPTCHA method.
This new method is the image recognition that we see today, called NoCaptcha ReCaptcha, which presents the computer or human with a series of images and asks them to identify images of fire hydrants, street signs, store fronts, cats, dogs or any other object. This is still fairly easy for a human, even though sometimes telling an awning on a house from a store front still messes with me. Computers are catching up rapidly as data about these image-based CAPTCHAs are used to train AI algorithms. It will not be long before computers surpass humans again, leaving us needing a new methodology to determine that the person behind the keyboard is in fact a person and not a robot.
Until next week stay safe and learn something new.
Scott Hamilton is a Senior Expert in Emerging Technologies at ATOS and can be reached with questions and comments via email to email@example.com.