This paper compares different pre-trained and fine-tuned large language models (LLMs) for hate speech detection. Our research underscores challenges in LLMs' cross-domain validity and overfitting risks. Through evaluations, we highlight the need for fine-tuned models that grasp the nuances of hate speech through greater label heterogeneity. We conclude with a vision for the future of hate speech detection, emphasizing cross-domain generalizability and appropriate benchmarking practices.
翻译:本文比较了不同预训练和微调的大型语言模型(LLMs)在仇恨言论检测中的表现。我们的研究强调了LLMs在跨领域有效性及过拟合风险方面的挑战。通过评估,我们揭示了微调模型需通过更大的标签异质性来把握仇恨言论细微之处的必要性。最后,我们展望了仇恨言论检测的未来,着重强调了跨领域泛化能力与恰当的基准测试实践。