CAPTCHA
Completely Automated Public Turing test to tell Computers and Humans Apart
May 2018
David Magalhães
@speeddragon
David Magalhães
About me
@speeddragon
Software Engineer @
Security Analyst @
Introduction
History
-
(Image Distortion) Captcha Invented in 1997
- Website Altavista used in 1997
-
PayPal start using it in 2001
- The term was first used in 2003
Where can we find it ?
Websites
- Websites are common places to encounter Captchas.
CloudFlare
Apps
- Although not common, some apps can implement Google ReCaptcha to avoid bots.
How to bypass ?
Where to start?
- Verify if webpage correctly implements captcha.
- Optical Character Recognition (OCR) software available for captchas.
Some types of attack
- Static CAPTCHA Identifier
- Fixation Attack
- Re-Riding Attack
- OCR Bruteforce
https://www.owasp.org/images/0/03/ASDC12-Attacking_CAPTCHAs_for_Fun_and_Profit.pdf
Image Processing
Human Workforce
- https://anti-captcha.com/mainpage
- 2$ USD per 1000 captchas
-
17s solve speed
- https://2captcha.com/
- 3$ USD per 1000 captchas
- 49s solve speed
Google ReCaptcha
Attacks and Responses
"I'm not a human: Breaking the Google reCAPTCHA"
March, 2016
https://www.blackhat.com/docs/asia-16/materials/asia-16-Sivakorn-Im-Not-a-Human-Breaking-the-Google-reCAPTCHA-wp.pdf
- Plays with cookies / user agent / etc.
- Trick website address with localhost.
- 2500 checkbox captchas per hour.
- Weekends had less blocking.
- Leverage Google Reverse Image Search, along other machine learning software.
- Image reused.
Voice Recognition
March, 2017
https://www.bleepingcomputer.com/news/security/researcher-breaks-recaptcha-using-googles-speech-recognition-api/
- Usage of SpeechRecognition library from Python
- Google Speech Recognition
- Google Cloud Speech API
- Houndify API
- Microsoft Bing Voice Recognition
Bypass via HTTP parameter pollution
March, 2018
https://andresriancho.com/recaptcha-bypass-via-http-parameter-pollution/
POST /recaptcha/api/siteverify
recaptcha-response=anything%26secret%3dPUBLIC-TEST-BYPASS_TOKEN&secret=6LeYIbsSAAAAAJezaIq3Ft_hSTo0YtyeFG-JgRtu
Bypass via HTTP parameter pollution
https://andresriancho.com/recaptcha-bypass-via-http-parameter-pollution/
Around ~3% of the integrations with reCAPTCHA were vulnerable.
Google Response
- Request frequency
- Normal, clear, voice sound to imperceptible voice sound (with distorsions)
- Clear image of cars, street sign, bridges, etc to noisy images, lower resolution images.
- Fixed select images to multiple images appearing with added delay.
Incremental Difficulty
- Raise number of digits in voice captcha.
- Tweek Advanced Risk Analysis System.
- Less relaxed wrong answers / image checked box.
- Less relaxed wrong answers / image checked box.
- Avoid image repetition.
Incremental Difficulty
How to implement?
Defending against possible attacks.
CloudFlare
- Use CloudFlare DNS
https://www.cloudflare.com/case-studies/troy-hunt/
Implement on the code
- Go to Google ReCaptcha page.
- Follow instructions.
- Adjust security.
Verify ReCaptcha
- Get g-recaptcha-response from User.
- Verify on the back end the token sent.
POST https://www.google.com/recaptcha/api/siteverify
secret=6LeIxAcTAAAAAGG-vFI1TnRWxMZNFuojJ4WifJWe&response=03ACgFB9smWHeHsOPEDTTb-OWMh-SgQISvttCGdp4tN4OW77W9r3bEeIHwd22EyQOmB466kdBm3SD26fMPeKByeXHJSKERi81bcH1b68ZwUU7W4m2TsAs65KzjUaE7t2uMffOR...2kMo4msFdLmj79uTeeCWaHZl2o5QqnF22qAImMSbxWMeMx5gC0O8SQINkmuPexXPHnpUmpzaqgI_WlseJI_q5VrDA
Verify ReCaptcha
{
"success": true|false,
"challenge_ts": timestamp, // timestamp of the challenge load (ISO format yyyy-MM-dd'T'HH:mm:ssZZ)
"hostname": string, // the hostname of the site where the reCAPTCHA was solved
"error-codes": [...] // optional
}
Why ReCaptcha ?
- State of the art CAPTCHA system.
- Always evolving.
- Easy to implement and to use.
Breaking Captcha
The Story
Once a upon a time
A website that didn't ask for captcha with valuable information.
And 24 hours later ....
... and 100.000 requests, something weird appear.
https://code.google.com/archive/p/kaptcha/
But something was weird
AJAX request didn't contain CAPTCHA response.
- Old endpoint still enabled.
- New endpoint checked captcha.
1 Year later, they fixed
And for a couple of months, I didn't have a solution for this ...
... until ...
$ aptitude search ocr
Ocrad
GNU Ocrad is an OCR (Optical Character Recognition) program based on a feature extraction method. It reads a bitmap image in pbm or pgm formats and produces text in byte (8-bit) or UTF-8 formats.
Ocrad includes a layout analyser able to separate the columns or blocks of text normally found on printed pages.
https://savannah.gnu.org/projects/ocrad/
Prepare Image first
- Create better image to convert the IMAGE to TEXT
-
Remove background
- Remove line
- Connect missing space
- Remove noise
-
Remove background
Get various code of choose the best
- NoLine Corrected
- NoBackground Corrected
- NoBackground Corrected with _
- Validating the code obtained
Threshold
for ($i = 1; $i <= 9 ; $i++) {
$v[$i] = self::correctCaptcha(
trim(shell_exec("ocrad --threshold=0.".$i." ".$newFile))
);
}
Use various threshold to obtain a better result
Improve final solution
- Check if size is 5.
-
Check if characters are lowercase.
-
Limited alphanumeric range.
- For each character find "_", and try to find another character in one of the 9 thresholds solutions.
- Remove "blank" character.
Table mapping
Run on some captchas with know solution ...
? = e | %% = 2y | y = 2 | IT = n | T = 7 |
W = w | rf = d | ] = p | L = c | i = x |
t = p | lt\\ = m | v = y | z = 2 | unicode ... |
Success Rate
- Improved from initial 4% to 20%
- 1 success captcha solved in each 5 attempts.
But wait, we can do better.
What if we don't ask for a CAPTCHA ?
CAPTCHA marked as solved
While session is enabled, we just need to solve one captcha.
Re-Riding Attack
Distributed ReCaptcha Bot
Work in progress
Idea
- Google allow good users to just click on "I'm not a robot"
- Automate that click!
Idea
-
Extract and use Google ReCaptcha validation token.
- Implement recaptcha token acceptance on crawler to simulate recaptcha success behaviour.
POST http://www.example.com/get'
id=323184&gRecaptchaResponse={{gCaptchaToken}}
Field Research
- Crawlers tend to use TOR even more.
- ReCaptcha painly slow on TOR network (for obvious reasons 😄).
- Two requests are made:
- www.google.com (colect ReCaptcha token)
- www.example.com (to extract information)
Field Research
- Use normal connection to extract recaptcha token.
- Use TOR to request API information with above ReCaptcha token
Distributed ReCaptcha Solver
- Develop Chrome extension to install in multiple computers.
- Harvest google captcha token via commie.io
- Note: The token has 120 seconds time expiration.
Inner works
- Try to not connect to the host website.
- Block request and modify HTML page.
- Not successful without editing /etc/hosts.
- Block all requests except the main page request.
Block all (most all) requests
chrome.webRequest.onBeforeRequest.addListener(function(data) {
if (data.tabId == openedTabId
&& data.url != "http://www.example.com/") {
return {cancel: true};
}
}
},{'urls': ["*://*.example.com/*"]}, ["blocking"]);
Replace HTML
- Replace HTML to only contain Google ReCaptcha.
var head = document.getElementsByTagName('head')[0];
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = 'https://www.google.com/recaptcha/api.js?hl=pt_PT';
head.appendChild(script);
var body = document.getElementsByTagName('body')[0];
while (body.firstChild) { body.removeChild(body.firstChild); }
var div = document.createElement("div");
div.setAttribute("style", "float:left;");
div.setAttribute("class", "g-recaptcha");
div.setAttribute("data-sitekey", "1XLd32hUUA522B0Gx7htcAQmanD890ZyCCo2i5T");
body.appendChild(div);
Auto Click
if (document.querySelector(".recaptcha-checkbox") != null) { var delay = 3000 + Math.random() * 2000; // milliseconds setTimeout(function() { if (document.querySelector(".recaptcha-checkbox") != null) { document.querySelector(".recaptcha-checkbox").click(); } }, delay); }
- Inject in google iframe.
Wait for success
- Request with successfully captcha solved
https://www.google.com/recaptcha/api2/userverify?k=X8LdChUUA3AAAABgG302AQfn69kNDSnm23lbo
Future Evolution
Of Captcha Breakers
Possible solutions
- Machine Learning
- Keras
- TensorFlow
https://github.com/JackonYang/captcha-tensorflow
https://medium.com/@ageitgey/how-to-break-a-captcha-system-in-15-minutes-with-machine-learning-dbebb035a710
FunCaptcha
Thank you
Questions?
CAPTCHA
By David Magalhães
CAPTCHA
Learn more about captcha, ReCaptcha, insights and vulnerabilities.
- 1,027