NoisyHate: Benchmarking Content Moderation Machine Learning Models with Human-Written Perturbations Online

Online texts with toxic content are a threat in social media that might cause cyber harassment. Although many platforms applied measures, such as machine learning-based hate-speech detection systems, to diminish their effect, those toxic content publishers can still evade the system by modifying the spelling of toxic words. Those modified words are also known as human-written text perturbations. Many research works developed certain techniques to generate adversarial samples to help the machine learning models obtain the ability to recognize those perturbations. However, there is still a gap between those machine-generated perturbations and human-written perturbations. In this paper, we introduce a benchmark test set containing human-written perturbations online for toxic speech detection models. We also recruited a group of workers to evaluate the quality of this test set and dropped low-quality samples. Meanwhile, to check if our perturbation can be normalized to its clean version, we applied spell corrector algorithms on this dataset. Finally, we test this data on state-of-the-art language models, such as BERT and RoBERTa, and black box APIs, such as perspective API, to demonstrate the adversarial attack with real human-written perturbations is still effective.

翻译：在线文本中的有害内容是对社交媒体的一种威胁，可能导致网络骚扰。尽管许多平台采取了诸如基于机器学习的有害言论检测系统等措施来减轻其影响，但这些有害内容的发布者仍可通过修改有害词汇的拼写来规避系统。这些修改后的词汇也被称为人工文本扰动。许多研究开发了特定技术来生成对抗样本，以帮助机器学习模型获得识别这些扰动的能力。然而，机器生成的扰动与人工撰写的扰动之间仍存在差距。本文介绍了一个包含在线人工扰动的基准测试集，用于有害言论检测模型。我们还招募了一组工作人员评估该测试集的质量，并剔除了低质量样本。同时，为了检验我们的扰动是否能被标准化为纯净版本，我们在该数据集上应用了拼写校正算法。最后，我们在BERT、RoBERTa等先进语言模型以及Perspective API等黑盒API上测试该数据，以证明真实人工扰动下的对抗攻击仍然有效。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/