This post was featured on the PHP Weekly newsletter, Aug. 8, 2013.
On my website, I decided to implement my own anti-robot captcha system for two reasons: 1) I wanted to keep the website code vanilla and 2) I disliked the CAPTCHAs provided by other services that either made you type in Chinese or that didn’t visually integrate well with the website.
Pre-flight check
Before we start, you should have the following ready:
- Knowledge of HTML
- An HTML form already set up on your website, capable of sending data to another page
Example:<form method="POST" action="senddata.php" ...>
- A text editor, I recommend Notepad++
- A PHP installation. Almost all webhosts will have this.
- Programming knowledge is a plus, but I will provide plenty of comments on the code.
All set? Great, let’s move on!
Overview
So far, we already have an HTML form, which submits data to another PHP page for processing. All we need to do then, is add on a captcha system to the form.
Our captcha will be math based, so it will ask a random math question such as “What is 3+2?” or “What is 8 * 4?”. There will also be an input box for the user/robot’s answer.
We will send the user/robot’s answer to the page specified in the form action
attribute, and check if it matches the actual answer. If it does, we process the form. If not, we stop the process.
Step 2: Adding the CAPTCHA.
First of all, we have to be able to generate a random math problem. We’ll try to keep it simple enough — no decimals, negative numbers, or algebra — but random enough so that robots can’t predict the answer. This basically means we’ll need:
- Two random numbers
- A random mathematical operation from the list: addition, multiplication, and subtraction. I left out division because of the possibility of requiring decimals (e.g. 10 divided by 3)
In PHP, making random numbers is simple enough. the rand(int $min, int $max)
function can provide us with random integers. Now how about the operand? The solution is to choose a random number between 0 and 2, and have each number represent a different operand. We can display to the user a plus, minus, or multiplication sign on their end. Thus, we arrive at this code:
<?php $num1 = rand(0,10); // pick a random number from 0 to 10 inclusive $num2 = rand(0,10); // same idea $o = rand(0,2); // 0 = plus, 1 = minus, 2 = multiply /* This function will use the integer value of $operand to show either a plus, minus, or times. */ function operand($o) { switch($o) { case 0: return "+"; break; case 1: return "-"; break; case 2: return "*"; break; default: return "?"; break; //Remark: We shouldn't ever get down here. } } ?>
You can put this code at the beginning of your form.
For the HTML part, all we need is a label, input field, and three hidden fields to store our two numbers and the integer representing our operand. Here is the bare-bones code. You should add this part to the end of your form, right before the Submit button.
<label for="math">What is <?php echo $num1 . " " . operand($o) . " " . $num2 . "?"; ?></label> <input type="text" id="math" name="userAnswer" size="3"></input> <input type="hidden" name="num1" value="<?php echo $num1; ?>"></input> <input type="hidden" name="operand" value="<?php echo $o; ?>"></input> <input type="hidden" name="num2" value="<?php echo $num2; ?>"></input> <br> <input type="submit" name="submit" value="Verify"></input>
This warrants a short explanation. Line 2, in essence, displays the math problem in a way that the user can understand it. Line 3 asks for the user/robot’s answer. Lines 4-6 hold the math captcha data that will be sent once the user clicks the submit button. This data will be checked against the user/robot’s input.
Step 3: Creating the validation page.
In the line of code containing the beginning of your form, you should have at least the method
and action
attributes already specified. If not, you can use this as an example:
<form action="submit.php" method="POST" name="myForm"></form>
You can name the file in the action
attribute whatever you want, but it MUST end in .php
.
Now open that file (create it if there is none), and we’ll begin the validation process.
Step 4: The validation code
Now, we need to get the data from the captcha into the PHP for it to process. Recall that in the HTML code of Step 2, we specified four input fields, three of which were hidden. We put a name
attribute on each of those, and now we can get their value
attribute in PHP. Copy and paste the following the blank file:
<?php if(!isset($_POST["userAnswer"])) { exit("You did not enter an answer! Try again."); // Safety check to make sure that the user actually put a value into the CAPTCHA } $userAnswer = $_POST["userAnswer"]; // This is what the client entered /* Compute the actual answer */ // Get the values in our form $num1 = $_POST["num1"]; // First number $num2 = $_POST["num2"]; // Second number $o = $_POST["operand"]; // INTEGER value of our operand (0, 1, or 2; corresponding to +, -, or *, respectively) // Calculate the actual answer $actual = -999; # Init variable switch($o) { case 0: $actual = $num1 + $num2; break; // 0 = Addition case 1: $actual = $num1 - $num2; break; // 1 = Subtraction case 2: $actual = $num1 * $num2; break; // 2 = Multiplication } /* Check against the user's input and cancel form submission if it's incorrect */ if($userAnswer != $actual) { exit("Sorry, you didn't pass the captcha."); } ?> <!-- code to execute on success -->
After this bit of code, you can put the desired code to execute. The code will now execute if and only if the user answers the captcha correctly.
It is out of the scope of this post to introduce how to get form data in PHP. If you followed carefully through the code, however, the general idea is to use $_POST[$name]
, where $name
is the name
attribute value of the HTML element.
Step 5: Congratulations!
There you are, the simplest way to create a math-based captcha system. It wasn’t that bad was it?
Step 6: But wait, there’s more!
I’ve set up a bare-bones demo in my sandbox so that you can see the captcha in action. The demo is at http://g-liu.com/sandbox/captchaform/.
You can also get the source code files below. Open these in a text editor.
- Form page: index.phps
- Validation page: submit.phps
If you have any questions, feel free to ask in the comments below.
Thank you Geoffrey for this walkthrough, the code does exactly what you say and works very well.
To make sure the answer is not a negative number, and also simplify the math for the user I made the first number 3-5 and the second number 1-2.
Hi, I love your code. But i realise there is one problem. Your math question don’t refresh when the user press the back button. Is it possible to come out with a solution to this? Thanks.
Good point. The back button may not cause a page refresh if the page is cached.
You can force the page to reload if the user has pressed the Back button, with a little JavaScript. See this SO Post for a good starting point.
Your script will return negative numbers like 0-8=-8
Rather than misleading naive users, breaking this into several posts would also teach the new user about modular design. This introductory post would introduce the skeleton design, with the details of some functions fleshed out in later posts.
What if you never get to those later posts? Well, the user is still left with a better understanding of both writing software and CAPTCHAs than they would have from this non-working naive example.
I wrote the CAPTCHA that an old employer used on its e-commerce purchasing page. The CAPTCHA image was created in PHP, but the answer was checked on a different box in Perl. (The box trusted with credit card transactions had a minimal OS install, to minimize the vulnerability surface.) The two boxes just shared an HMAC key. The md5 HMAC of the IP address and (time() | 0x1F ) was truncated to get a 96 bit cryptographic token. The cryptographic token was run through HMAC one more time to seed a 31-bit LFSR that output the base32 characters for the CAPTCHA, and afterwards the LFSR was the entropy source for the image randomization. This ensured that if some of the CAPTCHA images were weaker than others, a bot could only have a look at one puzzle every 32 seconds for each IP address it had, instead of being able to ask for as many images as it wanted and only solve the easy ones. The timestamp and crypto token were hidden fields in the form. On the verification side, the Perl script would (1) reject any timestamp older than 30 minutes (2) reject any answer coming from an IP address with too high a recent percentage of CAPTCHA failures (3) calculate the md5 HMAC of the IP address and (timestamp | 0x1F) and reject if it didn’t match the crypto token (4) reject any answer if the crypto token matched any successful CAPTCHA answer in the past 30 minutes and finally (5) calculate the md5 HMAC of the crypto token in order to seed the LFSR and figure out what the correct CAPTCHA answer was. Even if an attacker got a hold of the source code, as long as they didn’t have the shared HMAC key, the most efficient attack would be to bruit-force the 31-bit LFSR seed. (Assuming the image was sufficiently difficult for computer vision.)
During development, I actually watched an attacker try and verify a huge batch of stolen credit cards against my old employer’s credit card processor. The attacker was coming from an IP address in Virginia. When I flipped the switch to start checking if the crypto token was being reused (s/he solved the CAPTCHA once by hand), the attacker assumed s/he had hit a rate limit and immediately switched to an IP address in England. Because the token was tied to the IP address, the answers were rejected even sooner. The attacker then went away to find someone else’s e-commerce site to use for verifying stolen credit cards.
Note that I originally wrote a 32-bit LFSR in PHP, but the underflow behavior of Perl integers made it much easier to use a 31-bit LFSR. Also note that the checking if a crypto token was being reused came fairly late because that information was kept in a bdb database on disk on the credit card processing box, and I wanted to minimize disk I/O.
There was also one /24 IP subnet in the DC metro area that we needed to whitelist. Due to the email addresses they were giving us, we were pretty sure that was the satellite ground station for the Internet connections of a ton of Navy sailors. We were often getting more than one legit purchase in 32 seconds from that IP address. I might have actually AND-ed out the least significant octet of the IP address, limiting purchases to one in 32 seconds per /24 subnet, but you get the idea.
Anyway, feel free to implement the above in the language of your choice. It served us well. The reason we didn’t go with an existing open-source CAPTCHA was that all of the existing open-source CAPTCHAs we could find would have required the image generator and the verifier to share a filesystem, and there was no way we were going to open up that attack surface on the credit card processing box.
I am not buying your arguments. Simple or not the required goal is that it prevents SPAM-robots at all costs.
If you would like to contribute a post to this blog, with simple enough PHP that the beginning web developer can understand, AND able to stop 100% of robots all the time, please contact me and I will post it on this blog. Thank you!
This is the farthest thing from being anti-robot. If you think this will prevent spammers and the like from your website, you should really read up on why those ‘chinese’ captchas are the way they are.
With a little jquery:
var answer;
switch ($(‘input[name=operand]’).val()) {
case ‘+’: answer = parseInt($(‘input[name=num1]’).val()) + parseInt($(‘input[name=num2]’).val());
break;
case ‘-‘: …….
….
}
This is a simple example so that a beginning web designer could understand the code. I could definitely add some more anti-robot features such as replacing + – and * with “plus” “minus” and “*” or even tiny pictures representing those characters, and replace the numbers with words (e.g. 5 => “five”). You could even turn those words into pictures as well and name them with something that would confuse the robot e.g., name the picture with “five” two.gif 🙂
Of course, no system is completely robot-proof; you can always have brute-force human solvers who feed answers into the robots. But having some anti-spam protection is better than having none at all.