In the recent years, many software systems have adopted AI techniques, especially deep learning techniques. Due to their black-box nature, AI-based systems brought challenges to traceability, because AI system behaviors are based on models and data, whereas the requirements or policies are rules in the form of natural or programming language. To the best of our knowledge, there is a limited amount of studies on how AI and deep neural network-based systems behave against rule-based requirements/policies. This experience paper examines deep neural network behaviors against rule-based requirements described in natural language policies. In particular, we focus on a case study to check AI-based content moderation software against content moderation policies. First, using crowdsourcing, we collect natural language test cases which match each moderation policy, we name this dataset HateModerate; second, using the test cases in HateModerate, we test the failure rates of state-of-the-art hate speech detection software, and we find that these models have high failure rates for certain policies; finally, since manual labeling is costly, we further proposed an automated approach to augument HateModerate by finetuning OpenAI's large language models to automatically match new examples to policies. The dataset and code of this work can be found on our anonymous website: \url{https://sites.google.com/view/content-moderation-project}.
翻译:近年来,许多软件系统采用了人工智能技术,尤其是深度学习技术。由于人工智能系统的黑箱特性,其行为基于模型和数据,而需求或政策则是以自然语言或编程语言形式呈现的规则,这给可追溯性带来了挑战。据我们所知,关于基于AI和深度神经网络的系统如何遵守基于规则的需求/政策的研究十分有限。本文作为经验论文,研究了深度神经网络在面对用自然语言政策描述的基于规则需求时的表现。具体而言,我们以案例研究的形式,检验了基于AI的内容审核软件是否符合内容审核政策。首先,通过众包方式,我们收集了与每条审核政策匹配的自然语言测试案例,并将此数据集命名为HateModerate;其次,利用HateModerate中的测试案例,我们测试了现有最先进的仇恨言论检测软件的失败率,发现这些模型在某些政策下的失败率很高;最后,鉴于人工标注成本高昂,我们进一步提出了一种自动化方法,通过微调OpenAI的大语言模型来自动将新示例与政策匹配,从而扩充HateModerate。本研究的数据集和代码可在匿名网站获取:\url{https://sites.google.com/view/content-moderation-project}。