Whatever your personal view on porn, there are times when looking at explicit images and videos will never be acceptable. Browsing porn at work is generally a sackable offence for instance, and local laws usually forbid children from being exposed to explicit imagery.
To help uphold these standards, companies and parents regularly rely on content filters that block access to ‘adult’ websites. Social media companies are also doing their bit, automatically detecting and removing pornographic images.
But there is a problem. Automated porn detection systems are not as accurate as you might hope.
Machine learning and rude pics
Machine Learning is a relatively new technology that allows businesses to automate key tasks. By showing a special computer algorithm enough examples of a specific file type or picture, the system can learn how to spot the identifying features of each.
Take Panda Dome for instance. Our anti-malware system is trained to block and remove viruses automatically, even if we’ve never seen that particular infection before. How does this work? Our system has been trained using tens of thousands of examples of difference malware; it has then learned – with a high degree of accuracy – to spot the same tell-tale signs in other apps, links and email attachments.
Porn blockers work in a similar way. By showing the system enough pornographic images, the application can recognise other potentially offensive pictures.
But there’s a problem.
The nuances of nudity
The trouble with auto-detection of porn is that every person has a slightly different definition of what is, and is not, acceptable. A picture of a woman in a bikini may be fine, but a similar picture of a lingerie-clad lady is not. To a computer algorithm, the woman in both pictures is wearing roughly the same amount of clothes, but it struggles to identify the context of those clothes.
Social networks are experiencing exactly this problem already. Facebook famously removes pictures of breastfeeding mothers for instance because their algorithm is unable to understand the context of the image. Tumblr has had even bigger problems since activating their porn filter in December – the system has been wrongly flagging and removing all manner of non-porn images, including, fish, vases and witches.
Whether an intelligent filter permits or blocks a borderline image, someone will be upset.
Ultimately, intelligent systems built using machine learning are flawed because the people programming them are unable to specify exactly where the line between ‘ok’ and ‘porn’ is crossed. The algorithms may be able to block 99.9% of questionable content, but the 0.1% that remains will always be an issue.
These nuances are irrelevant to anti-malware detection because a file can only be one of two states: ‘virus’ or ‘not virus’. Images on the other hand have three states: ‘porn’, ‘not porn’ and ‘maybe porn’. ‘Maybe porn’ is where machine learning can (and does) fail. It is also where the majority of investment in automated systems will take place in the coming years.
ML is powerful, useful and flexible. Perhaps with enough training these systems really will be able to block content in line with your own preferences.
To learn more about the protective power of machine learning and how it can protect you right now, please download a free trial of Panda Security Dome.
1 comment
Jenn thanks for