SPAM, no not the popular English canned meat, but the online nightmare that most website owners face every day.
What exactly is spam you ask? A quick breakdown of the intention of spamming:
Advertising on a massive scale
Manipulating online voting systems
Destabilising a critical human equilibrium (i.e. creating an unfair advantage)
Vandalising or destroying the integrity of a website
Creating unnatural, unethical links to boost search engine rankings
Accessing private information
Spreading malicious code.
In other words, spam is any unwarranted interaction or input on a website, whether malicious or for the benefit of the spammer.
The spam plague
There are really only two ways to deal with spam; one is to filter it through one of the popular programmes such as Akismet or Mollom on your server side, or (to get to the point of this article) use CAPTCHAs.
Automated spam plagues website owners to no end, so CAPTCHAs are appealing and compelling … initially. The time needed to moderate and review user-generated content versus the time needed to implement a CAPTCHA is what pushes most developers to do it.
CAPTCHA is very popular, and the reCAPTCHA project estimates that over 200 million reCAPTCHAs are completed daily and that it takes an average of 10 seconds to complete one.
CAPTCHAs approach the problem directly from the user side and focused on stopping spammers. But unfortunately this means, for the most part, the actual user is overlooked and the normal behaviour of the user is affected to some extent. Casey Henry looked at the effectiveness of CAPTCHAs on conversion rates and suggested a possible conversion loss of around 3% which could possibly dent your business sales.
Solving CAPTCHA codes
Personally I can’t stand CAPTCHA codes. Most of the time I rack my brain trying to decipher what it says and when I enter it, it’s normally wrong! Stanford University released a report entitled “How Good Are Humans at Solving CAPTCHAs? A Large Scale Evaluation” recently, and they revealed a shocking stat that most audio CAPTCHAs (developed for the visually impaired) take an average of 28.4 seconds to complete and also noted some issues for non-English speaking users! This doesn’t even begin to delve into people with dyslexia and other special needs…
The question is, are CAPTCHAs so unusable that they shouldn’t be used at all? Perhaps more importantly, does a usable CAPTCHA that cannot be cracked exist? If the answer is no, what is the real solution to online spam?
One of the greatest advantages that humans have over machines is our ability to visually recognise patterns. The most popular CAPTCHA technique derives from this. Developers have explored many options such as games, equations, recognition tests, and even interactive tasks. The most popular type of CAPTCHA used now is text recognition (as seen on the reCAPTCHA project).
Optical character recognition technology
reCAPTCHA was created in the home of the CAPTCHA pioneers and is now run by Google. The project uses scanned text that optical character recognition (OCR) technology has failed to interpret. This, in theory, provides unbreakable CAPTCHAs. The project also provides audio alternatives for visually impaired users.
Another take on the basic text CAPTCHA was introduced in late 2010 by Solve Media, whose solution was to replace text with an advertisement and a related question, a move that many saw as too invasive.
Some suggestions have gone the way of answering logic questions that are based on questions a 7 year old can answer. These are more accessible than text and image recognition; a big advantage but it comes at a price.
Are CAPTCHAs the right solution?
First, the time required to read and comprehend these questions will vary because they are unusual and unknown to users. Secondly, computers can still break these CAPTCHAs with the likes of IBM’s Watson recently showcasing an eerily human-like ability to process language.
The biggest problem with logic questions is that they’re specific to a language, usually English. Providing millions of questions in every language in order to avoid alienating potential users would be a huge task. When presented with such a daunting prospect, the same question resurfaces: are CAPTCHAs the right solution?
One of the more interesting ideas that has come about is ‘Friend Recognition’. This was an idea by Facebook called social authentication and is used to verify account authenticity. What it does is show you a few pictures of your friends and asks you to name the person in the photos. The theory is that hackers might know your password but they don’t know your friends’ names.
There is only one problem: how many people can actually recognise most of their ‘friends’ on Facebook? The reality is that friend requests are exchanged like products at Hi-Fi Corporation. As clever as Facebook’s idea might be, it is flawed.
The deciding fact
Despite all the research that goes into CAPTCHA-breaking, most spammers are not going to go to the effort of defeating them. The sheer quantity of websites available to attack and the speed at which they can do it means that CAPTCHA-breaking is unlikely to concern many spammers.
The BBC, which is one of the most highly scrutinized institutions in the UK, recently comented:
“Visually impaired participants expected full accessibility from the BBC and we felt it would affect our reputation to use them. Elderly users had issues with the distorted text. The logic puzzles were found to be odd and patronising. The audio was struggled with. Overall, extremely negative feelings were expressed towards CAPTCHA technology,” said Rowun Giles, BBC.
Alternative options to CAPTCHA do exsist without the need for user involvement. CAPTCHAs just don’t cut it in todays web world as an effort to combat spam as it’s creating a bump for the user to contend with.
As I mentioned earlier, services such as Akismet and Mollom all analyze and flag spam automatically, but why not develop your own system that is tuned to the mechanics of your website? Taking away the need for the user to stop spam will improve usability and the user’s impression of your website. Manual checking is sometimes a sacrifice worth making.
The Honeypot method
The idea behind the honeypot method is simple: website forms would include an additional field that is hidden to users. Spam robots process and interact with raw HTML rather than render the source code, and therefore would not detect that the field is hidden. If data is inserted into this “honeypot”, the website administrator could be certain that it was not done by a genuine user. Remember thieves (i.e. spammers) are looking for minimal work for high payoff, the method used does not stop intruders so much as the presence of any hurdle.
The incentive to spam needs to be removed, then spam will slowly wear away over time and eventually remove the need for CAPTCHAs. But in reality, we are likely to see a combination of technology and law hopefully dealing the death blow to spammers.
For now researching the alternatives such as Akismet where spam detection is silent to users and implementing them on our websites is definitely the way forward. It will inevitably help with conversion rates and site usability, if users want to comment on your site, it should be a simple experience.
In conclusion, invisible systems are the way to go to create a normal web experience for your user and for now CAPTCHAs should be the last thing on your list of spam detection!