A team of researchers has developed a high accuracy deep learning-based classifier designed to detect YouTube videos with disturbing content for kids. This was done after finding that the current recommendation algorithm used by the platform to suggest related content is quite lacking.
The research was prompted by the increasing number of young children who have their attention drawn to more modern video consumption platforms such as Google’s YouTube, which provides almost limitless amounts of toddler-tailored content.
Although most of it is accurately categorized by YouTube’s internal algorithms, the counter-measures put in place to protect young viewers from disturbing content are ineffective when it comes to detecting this type of videos in a timely fashion according to the study.
As described by the researchers, videos classified by the Motion Picture Association of America (MPAA) as PG or PG-13 are considered disturbing, while those with R or NC-17 are considered to be restricted.
Using a dataset of 133,806 videos, the researchers tested their binary classifier as part of a large-scale toddler-oriented study of YouTube content and they found:
122K (91.4%) non-disturbing videos and 11K (8.6%) inappropriate videos. These findings highlight the gravity of the problem: a parent searching on YouTube with simple toddler-related keywords and casually selecting from the recommended videos, is likely to expose their child to a substantial number of inappropriate videos.
Following their extensive tests, the deep learning video content classifier was able to reach an accuracy of 82.8% and it also helped them reach the conclusion that toddlers watching videos on the YouTube platform have a 45% chance of being suggested an inappropriate one within 10 hops if starting “from a video that appears among the top ten results of a toddler-appropriate keyword search (e.g., Peppa Pig).”
Although protecting young children from inappropriate content is also the job of their parents, YouTube should also have in place accurate counter-measures designed to remove such videos from the recommendation queue especially when kids use benign videos as a starting point.
The researchers also evaluated YouTube’s toddler inappropriate content blocking algorithms and found that:
As of January 15, 2019 only 8.0% of the suitable, 10.6% of the disturbing, 4.1% of the restricted, and 2.5% of the irrelevant videos were removed, while from those that were still available, 0.0%, 8.0%, 2.9%, and 0.1%, respectively, were marked as age-restricted. Alarmingly, the amount of the deleted disturbing and restricted videos, is significantly low. The same observation stands for the amount of disturbing and restricted videos marked as age-restricted.
“Considering the advent of algorithmic content creation (e.g., “deep fakes”) and the monetization opportunities on sites like YouTube, there is no reason to believe there will be an organic end to this problem,” the research team concluded. “Our classiﬁer, and the insights gained from our analysis can be used as a starting point to gain a deeper understanding and begin mitigating this issue.”
The “Disturbed YouTube for Kids: Characterizing and Detecting Disturbing Content on YouTube” paper has been authored by researchers from Cyprus University of Technology, University of Alabama at Birmingham, Telefonica Research, and Boston University, and it is publicly available on the arXiv research electronic archive.
The research project co-authored by Kostantinos Papadamou, Antonis Papasavva, Savvas Zannettou, Jeremy Blackburn, Nicolas Kourtellis, Ilias Leontiadis, Gianluca Stringhini, and Michael Sirivianos was funded through the European Union’s Horizon 2020 Research and Innovation program under the Marie Skłodowska-Curie ENCASE project (Grant Agreement No. 691025).