The growing prevalence and rapid evolution of offensive language in social media amplify the complexities of detection, particularly highlighting the challenges in identifying such content across diverse languages. This survey presents a systematic and comprehensive exploration of Cross-Lingual Transfer Learning (CLTL) techniques in offensive language detection in social media. Our study stands as the first holistic overview to focus exclusively on the cross-lingual scenario in this domain. We analyse 67 relevant papers and categorise these studies across various dimensions, including the characteristics of multilingual datasets used, the cross-lingual resources employed, and the specific CLTL strategies implemented. According to "what to transfer", we also summarise three main CLTL transfer approaches: instance, feature, and parameter transfer. Additionally, we shed light on the current challenges and future research opportunities in this field. Furthermore, we have made our survey resources available online, including two comprehensive tables that provide accessible references to the multilingual datasets and CLTL methods used in the reviewed literature.
翻译:社交媒体中攻击性语言的日益普遍和快速演变加剧了检测的复杂性,尤其凸显了在不同语言中识别此类内容所面临的挑战。本综述对社交媒体攻击性语言检测中的跨语言迁移学习技术进行了系统而全面的探讨。本研究是该领域首个专注于跨语言场景的整体性综述。我们分析了67篇相关论文,并从多个维度对这些研究进行了分类,包括所用多语言数据集的特征、采用的跨语言资源以及具体实施的跨语言迁移学习策略。根据“迁移什么”,我们总结了三种主要的跨语言迁移学习方法:实例迁移、特征迁移和参数迁移。此外,我们阐明了该领域当前的挑战和未来的研究机遇。同时,我们已将综述资源在线公开,其中包括两个综合性表格,为所综述文献中使用的多语言数据集和跨语言迁移学习方法提供了便捷的参考。