Completely Automated Public Turing test to tell Computers and Humans Apart

May 2018

David Magalhães


David Magalhães

About me


Software Engineer @

Security Analyst @



  • (Image Distortion) Captcha Invented in 1997
  • Website Altavista used in 1997
  • PayPal start using it in 2001
  • The term was first used in 2003

Where can we find it ?


  • Websites are common places to encounter Captchas.



  • Although not common, some apps can implement Google ReCaptcha to avoid bots.

How to bypass ?

Where to start?

  • Verify if webpage correctly implements captcha.
  • Optical Character Recognition (OCR) software available for captchas.

Some types of attack

  • Static CAPTCHA Identifier
  • Fixation Attack
  • Re-Riding Attack
  • OCR Bruteforce

Image Processing

Human Workforce

    • 2$ USD per 1000 captchas
    • 17s solve speed
    • 3$ USD per 1000 captchas
    • 49s solve speed

Google ReCaptcha

Attacks and Responses

"I'm not a human: Breaking the Google reCAPTCHA"

March, 2016

  • Plays with cookies / user agent / etc.
  • Trick website address with localhost.
  • 2500 checkbox captchas per hour.
  • Weekends had less blocking.
  • Leverage Google Reverse Image Search, along other machine learning software.
  • Image reused.

Voice Recognition

March, 2017

  • Usage of SpeechRecognition library from Python
    • Google Speech Recognition
    • Google Cloud Speech API
    • Houndify API
    • Microsoft Bing Voice Recognition

Bypass via HTTP parameter pollution

March, 2018

POST /recaptcha/api/siteverify


Bypass via HTTP parameter pollution

Around ~3% of the integrations with reCAPTCHA were vulnerable.

Google Response

  • Request frequency
  • Normal, clear, voice sound to imperceptible voice sound (with distorsions)
  • Clear image of cars, street sign, bridges, etc to noisy images, lower resolution images.
  • Fixed select images to multiple images appearing with added delay.

Incremental Difficulty

  • Raise number of digits in voice captcha.
  • Tweek Advanced Risk Analysis System.
    • Less relaxed wrong answers / image checked box.
  • Avoid image repetition.

Incremental Difficulty

How to implement?

Defending against possible attacks.


  • Use CloudFlare DNS

Implement on the code

  • Go to Google ReCaptcha page.
  • Follow instructions.
  • Adjust security.

Verify ReCaptcha

  • Get g-recaptcha-response from User.
  • Verify on the back end the token sent.


Verify ReCaptcha

  "success": true|false,
  "challenge_ts": timestamp,  // timestamp of the challenge load (ISO format yyyy-MM-dd'T'HH:mm:ssZZ)
  "hostname": string,         // the hostname of the site where the reCAPTCHA was solved
  "error-codes": [...]        // optional

Why ReCaptcha ?

  • State of the art CAPTCHA system.
  • Always evolving.
  • Easy to implement and to use.

Breaking Captcha

The Story

Once a upon a time

A website that didn't ask for captcha with valuable information.

And 24 hours later ....

... and 100.000 requests, something weird appear.

But something was weird

AJAX request didn't contain CAPTCHA response.

  • Old endpoint still enabled.
  • New endpoint checked captcha.

1 Year later, they fixed

And for a couple of months, I didn't have a solution for this ...

... until ...

$ aptitude search ocr


GNU Ocrad is an OCR (Optical Character Recognition) program based on a feature extraction method. It reads a bitmap image in pbm or pgm formats and produces text in byte (8-bit) or UTF-8 formats.


Ocrad includes a layout analyser able to separate the columns or blocks of text normally found on printed pages.

Prepare Image first

  • Create better image to convert the IMAGE to TEXT

    • Remove background
    • Remove line
    • Connect missing space
    • Remove noise

Get various code of choose the best

  • NoLine Corrected
  • NoBackground Corrected
  • NoBackground Corrected with _
  • Validating the code obtained


for ($i = 1; $i <= 9 ; $i++) {
    $v[$i] = self::correctCaptcha(
        trim(shell_exec("ocrad --threshold=0.".$i." ".$newFile))

Use various threshold to obtain a better result

Improve final solution

  • Check if size is 5.
  • Check if characters are lowercase.
  • Limited alphanumeric range.
  • For each character find "_", and try to find another character in one of the 9 thresholds solutions.
  • Remove "blank" character.

Table mapping

Run on some captchas with know solution ...

? = e %% = 2y y = 2 IT = n T = 7
W = w rf = d ] = p L = c i = x
t = p lt\\ = m v = y z = 2 unicode ...

Success Rate

  • Improved from initial 4% to 20%
  • 1 success captcha solved in each 5 attempts.

But wait, we can do better.

What if we don't ask for a CAPTCHA ?

CAPTCHA marked as solved

While session is enabled, we just need to solve one captcha.

Re-Riding Attack

Distributed ReCaptcha Bot

Work in progress


  • Google allow good users to just click on "I'm not a robot"
    • Automate that click!


  • Extract and use Google ReCaptcha validation token.
    • Implement recaptcha token acceptance on crawler to simulate recaptcha success behaviour.

Field Research

  • Crawlers tend to use TOR even more.
  • ReCaptcha painly slow on TOR network (for obvious reasons 😄).
  • Two requests are made:
    • (colect ReCaptcha token)
    • (to extract information)

Field Research

  • Use normal connection to extract recaptcha token.
  • Use TOR to request API information with above ReCaptcha token

Distributed ReCaptcha Solver

  • Develop Chrome extension to install in multiple computers.
  • Harvest google captcha token via
    • ​Note: The token has 120 seconds time expiration.

Inner works

  • Try to not connect to the host website.
    • Block request and modify HTML page.
    • Not successful without editing /etc/hosts.
  • Block all requests except the main page request.

Block all (most all) requests

chrome.webRequest.onBeforeRequest.addListener(function(data) {
  if (data.tabId == openedTabId 
           && data.url != "") {
      return {cancel: true};
},{'urls': ["*://**"]}, ["blocking"]);

Replace HTML

  • Replace HTML to only contain Google ReCaptcha.
var head = document.getElementsByTagName('head')[0];
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = '';

var body = document.getElementsByTagName('body')[0];
while (body.firstChild) { body.removeChild(body.firstChild); }

var div = document.createElement("div");
div.setAttribute("style", "float:left;");
div.setAttribute("class", "g-recaptcha");
div.setAttribute("data-sitekey", "1XLd32hUUA522B0Gx7htcAQmanD890ZyCCo2i5T");

Auto Click

if (document.querySelector(".recaptcha-checkbox") != null) {
  var delay = 3000 + Math.random() * 2000; // milliseconds
    setTimeout(function() {
      if (document.querySelector(".recaptcha-checkbox") != null) {
    }, delay);
  • Inject in google iframe.

Wait for success

  • Request with successfully captcha solved

Future Evolution

Of Captcha Breakers

Possible solutions

  • Machine Learning
    • Keras
    • TensorFlow


Thank you


Made with