In the evolving landscape of online communication, hate speech detection remains a formidable challenge, further compounded by the diversity of digital platforms. This study investigates the effectiveness and adaptability of pre-trained and fine-tuned Large Language Models (LLMs) in identifying hate speech, to address two central questions: (1) To what extent does the model performance depend on the fine-tuning and training parameters?, (2) To what extent do models generalize to cross-domain hate speech detection? and (3) What are the specific features of the datasets or models that influence the generalization potential? The experiment shows that LLMs offer a huge advantage over the state-of-the-art even without pretraining. Ordinary least squares analyses suggest that the advantage of training with fine-grained hate speech labels is washed away with the increase in dataset size. We conclude with a vision for the future of hate speech detection, emphasizing cross-domain generalizability and appropriate benchmarking practices.
翻译:在不断演变的在线交流环境中,仇恨言论检测仍然是一项艰巨的挑战,而数字平台的多样性进一步加剧了这一难题。本研究探讨了预训练及微调后的大型语言模型(LLMs)在识别仇恨言论方面的有效性和适应性,旨在解决三个核心问题:(1)模型性能在多大程度上依赖于微调和训练参数?(2)模型在跨领域仇恨言论检测中的泛化能力如何?(3)数据集或模型中有哪些具体特征会影响其泛化潜力?实验表明,即使未经预训练,LLMs相比现有最优方法仍具有显著优势。普通最小二乘法分析表明,使用细粒度仇恨言论标签进行训练的优势会随着数据集规模的增大而减弱。最后,我们对仇恨言论检测的未来发展提出展望,强调跨领域泛化能力和恰当的基准测试实践的重要性。