The growing prevalence and rapid evolution of offensive language in social media amplify the complexities of detection, particularly highlighting the challenges in identifying such content across diverse languages. This survey presents a systematic and comprehensive exploration of Cross-Lingual Transfer Learning (CLTL) techniques in offensive language detection in social media. Our study stands as the first holistic overview to focus exclusively on the cross-lingual scenario in this domain. We analyse 67 relevant papers and categorise these studies across various dimensions, including the characteristics of multilingual datasets used, the cross-lingual resources employed, and the specific CLTL strategies implemented. According to "what to transfer", we also summarise three main CLTL transfer approaches: instance, feature, and parameter transfer. Additionally, we shed light on the current challenges and future research opportunities in this field. Furthermore, we have made our survey resources available online, including two comprehensive tables that provide accessible references to the multilingual datasets and CLTL methods used in the reviewed literature.
翻译:社交媒体中攻击性语言的日益普遍和快速演变加剧了检测的复杂性,尤其凸显了跨语言识别此类内容的挑战。本综述系统且全面地探讨了跨语言迁移学习(CLTL)技术在社交媒体攻击性语言检测中的应用,是首个专注于该领域跨语言场景的系统性综述。我们分析了67篇相关论文,并从多语言数据集特征、所用跨语言资源以及具体实施的CLTL策略等维度对研究进行分类。基于“迁移内容”这一维度,我们归纳了三种主要的CLTL迁移方法:实例迁移、特征迁移和参数迁移。此外,我们还揭示了该领域当前面临的挑战与未来研究机遇。同时,我们已在线上提供综述资源,包括两个综合表格,为所综述文献中使用的多语言数据集和CLTL方法提供便捷的参考文献索引。