Testing Hateful Speeches against Policies

In the recent years, many software systems have adopted AI techniques, especially deep learning techniques. Due to their black-box nature, AI-based systems brought challenges to traceability, because AI system behaviors are based on models and data, whereas the requirements or policies are rules in the form of natural or programming language. To the best of our knowledge, there is a limited amount of studies on how AI and deep neural network-based systems behave against rule-based requirements/policies. This experience paper examines deep neural network behaviors against rule-based requirements described in natural language policies. In particular, we focus on a case study to check AI-based content moderation software against content moderation policies. First, using crowdsourcing, we collect natural language test cases which match each moderation policy, we name this dataset HateModerate; second, using the test cases in HateModerate, we test the failure rates of state-of-the-art hate speech detection software, and we find that these models have high failure rates for certain policies; finally, since manual labeling is costly, we further proposed an automated approach to augument HateModerate by finetuning OpenAI's large language models to automatically match new examples to policies. The dataset and code of this work can be found on our anonymous website: \url{https://sites.google.com/view/content-moderation-project}.

翻译：近年来，许多软件系统采用了人工智能技术，尤其是深度学习技术。由于人工智能系统的黑箱特性，其行为基于模型和数据，而需求或政策则是以自然语言或编程语言形式呈现的规则，这给可追溯性带来了挑战。据我们所知，关于基于AI和深度神经网络的系统如何遵守基于规则的需求/政策的研究十分有限。本文作为经验论文，研究了深度神经网络在面对用自然语言政策描述的基于规则需求时的表现。具体而言，我们以案例研究的形式，检验了基于AI的内容审核软件是否符合内容审核政策。首先，通过众包方式，我们收集了与每条审核政策匹配的自然语言测试案例，并将此数据集命名为HateModerate；其次，利用HateModerate中的测试案例，我们测试了现有最先进的仇恨言论检测软件的失败率，发现这些模型在某些政策下的失败率很高；最后，鉴于人工标注成本高昂，我们进一步提出了一种自动化方法，通过微调OpenAI的大语言模型来自动将新示例与政策匹配，从而扩充HateModerate。本研究的数据集和代码可在匿名网站获取：\url{https://sites.google.com/view/content-moderation-project}。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日