This paper offers a comprehensive survey of Arabic datasets focused on online toxic language. We systematically gathered a total of 49 available datasets and their corresponding papers and conducted a thorough analysis, considering 16 criteria across three primary dimensions: content, annotation process, and reusability. This analysis enabled us to identify existing gaps and make recommendations for future research works.
翻译:本文对以在线有毒语言为主题的阿拉伯语数据集进行了全面综述。我们系统性地收集了总共49个可用数据集及其相关论文,并从内容、标注过程和可复用性三个主要维度出发,依据16项标准进行了深入分析。该分析使我们能够识别现有研究空白,并为未来的研究工作提出建议。