An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software

The exponential growth of social media platforms has brought about a revolution in communication and content dissemination in human society. Nevertheless, these platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography, leading to severe negative consequences such as harm to teenagers' mental health. Despite tremendous efforts in developing and deploying textual and image content moderation methods, malicious users can evade moderation by embedding texts into images, such as screenshots of the text, usually with some interference. We find that modern content moderation software's performance against such malicious inputs remains underexplored. In this work, we propose OASIS, a metamorphic testing framework for content moderation software. OASIS employs 21 transform rules summarized from our pilot study on 5,000 real-world toxic contents collected from 4 popular social media applications, including Twitter, Instagram, Sina Weibo, and Baidu Tieba. Given toxic textual contents, OASIS can generate image test cases, which preserve the toxicity yet are likely to bypass moderation. In the evaluation, we employ OASIS to test five commercial textual content moderation software from famous companies (i.e., Google Cloud, Microsoft Azure, Baidu Cloud, Alibaba Cloud and Tencent Cloud), as well as a state-of-the-art moderation research model. The results show that OASIS achieves up to 100% error finding rates. Moreover, through retraining the models with the test cases generated by OASIS, the robustness of the moderation model can be improved without performance degradation.

翻译：社交媒体平台的指数级增长带来了人类社会交流与内容传播的革命。然而，这些平台正日益被滥用于传播有毒内容，包括仇恨言论、恶意广告和色情信息，导致青少年心理健康受损等严重后果。尽管在开发和部署文本与图像内容审核方法方面付出了巨大努力，恶意用户仍可通过将文本嵌入图像（如含干扰要素的文本截图）来规避审核。我们发现，现代内容审核软件对此类恶意输入的应对能力尚未得到充分探索。本文提出OASIS——一种针对内容审核软件的蜕变测试框架。OASIS基于对从Twitter、Instagram、新浪微博和百度贴吧四大流行社交媒体应用收集的5000条真实世界有毒内容进行的先导研究所总结的21条变换规则，在输入有毒文本内容后，可生成保留毒性但易绕过审核的图像测试用例。在评估中，我们使用OASIS测试了来自知名企业（谷歌云、微软Azure、百度云、阿里云和腾讯云）的五款商业文本内容审核软件，以及一款最先进的审核研究模型。结果显示，OASIS的错误发现率高达100%。此外，通过使用OASIS生成的测试用例对模型进行再训练，可在不降低性能的前提下提升审核模型的鲁棒性。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日