Spam comments can be a real problem for any website, not just the big guys, little guys such as myself get hit by them all the time.
Many websites choose to use something called CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) to deal with this problem. Normally the CAPTCHA implemented is you having to read a word from a box (which is actually an image generated by the server) and write it down. Unfortunately these aren’t as good as they used to be now. You see, OCR (Optical Character Recognition) has moved on a fair bit in recent years to the point where computers are very very good at reading what the word says (sometimes even better than the humans).
There are of course a few other variants of this system such as picking the shapes etc, although many of them can be solved by the computer through trial and error.
But complex (and more often, useless) as these current solutions are they are still somewhat flawed. They require the human to do something to stop spam, not the other way round.
Clearly these are not an ideal solution. I prefer something known as negative CAPTCHA. This is a system where you have a comments field, but you hide it from the user using a language the robot can’t understand (CSS in this case). You then have a another comments box which the user does see but is not called comments (something like feedback will suffice). Then, when a comment is sent in, you check to see if the comments field (AKA, the hidden one) is filled in. If it is then it must have been a robot because the user couldn’t see the box to fill it in. This way the user ha nothing to do and it is up to the robot to give itself away.
This is a nice system and one I have been using for several years (it seems to be a very rare idea which I hardly ever see mentioned, people for some reason seem to like proactive defence) with a very very high success rate.
To give you an idea, when I first put this in I was getting about 70spam comments a day plus on my gallery, maybe more. When I put this in I get maybe one spam comment every couple of months.
All very nice you are thinking, but there is more…
You see, when going through my logs I have found an anti spam side effect of a feature I use all the the time.
My gallery uses ID numbers for pictures. When you request a picture you might request the following:
URL http://www.craigk.org/pictures/p/403/
Now this URL suggests that you are viewing the index page in a folder called 403 which is inside the folder pictures. However, I actually use something called mod rewrite to change the URL’s. So in this case the real URL which this links to is as follows:
URL http://www.craigk.org/pictures/view.php?id=403
Now I use mod rewrite to make the URL’s cleaner and nicer looking.
For those who care, the code for this is placed in a .htaccess file in the current folder and reads:
RewriteEngine On
RewriteRule ^p/([^/]+)/$ view.php?id=$1 [L]
This has a side effect though for spammers. You see, this means the spammers don’t know there is no folder called 402 or called p. Instead, when they read my comments file is called add_comments.php they have to make the assumption that this file is located here: http://www.craigk.org/pictures/p/403/add_comment.php (which it isn’t). Instead, it is actually located here: http://www.craigk.org/pictures/add_comment.php (if you click it you get the error you see when you didn’t use my comments form to get to this file (which you wouldn’t be if you click that link)).
According to my website log this .htaccess side effect has stopped 53 spam comments in the last 10 days.
I like getting something for nothing.