How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have

Due to the broad range of social media platforms, the requirements of abusive language detection systems are varied and ever-changing. Already a large set of annotated corpora with different properties and label sets were created, such as hate or misogyny detection, but the form and targets of abusive speech are constantly evolving. Since, the annotation of new corpora is expensive, in this work we leverage datasets we already have, covering a wide range of tasks related to abusive language detection. Our goal is to build models cheaply for a new target label set and/or language, using only a few training examples of the target domain. We propose a two-step approach: first we train our model in a multitask fashion. We then carry out few-shot adaptation to the target requirements. Our experiments show that using already existing datasets and only a few-shots of the target task the performance of models improve both monolingually and across languages. Our analysis also shows that our models acquire a general understanding of abusive language, since they improve the prediction of labels which are present only in the target dataset and can benefit from knowledge about labels which are not directly used for the target task.

翻译：由于社交媒体平台种类繁多，abusive 语言检测系统的需求多样且不断变化。尽管已有大量标注语料库，涵盖仇恨言论或厌女言论检测等不同特性和标签集，但 abusive 语言的形式和目标仍在持续演变。鉴于新语料库的标注成本高昂，本研究利用已有数据集，这些数据集覆盖了 abusive 语言检测相关的广泛任务。我们的目标是仅使用目标领域的少量训练样本，为新的目标标签集和/或语言低成本构建模型。我们提出了一种两步法：首先以多任务方式训练模型，随后进行针对目标需求的少样本适配。实验表明，利用现有数据集和目标任务的少量样本，模型性能在单语言和跨语言场景下均有提升。我们的分析还显示，模型获得了对 abusive 语言的通用理解，能够改进仅存在于目标数据集中的标签预测，并从与目标任务非直接相关的标签知识中获益。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日