In the evolving landscape of online communication, hate speech detection remains a formidable challenge, further compounded by the diversity of digital platforms. This study investigates the effectiveness and adaptability of pre-trained and fine-tuned Large Language Models (LLMs) in identifying hate speech, to address two central questions: (1) To what extent does the model performance depend on the fine-tuning and training parameters?, (2) To what extent do models generalize to cross-domain hate speech detection? and (3) What are the specific features of the datasets or models that influence the generalization potential? The experiment shows that LLMs offer a huge advantage over the state-of-the-art even without pretraining. To answer (1) we analyze 36 in-domain classifiers comprising LLaMA, Vicuna, and their variations in pre-trained and fine-tuned states across nine publicly available datasets that span a wide range of platforms and discussion forums. To answer (2), we assessed the performance of 288 out-of-domain classifiers for a given end-domain dataset. In answer to (3), ordinary least squares analyses suggest that the advantage of training with fine-grained hate speech labels is greater for smaller training datasets but washed away with the increase in dataset size. We conclude with a vision for the future of hate speech detection, emphasizing cross-domain generalizability and appropriate benchmarking practices.
翻译:在在线交流不断演变的背景下,仇恨言论检测依然是一个严峻挑战,而数字平台的多样性进一步加剧了这一挑战。本研究探讨预训练与微调的大型语言模型(LLMs)在识别仇恨言论中的有效性与适应性,旨在解决三个核心问题:(1)模型性能在多大程度上依赖于微调和训练参数?(2)模型在跨领域仇恨言论检测中的泛化能力如何?(3)影响泛化潜力的数据集或模型的具体特征是什么?实验表明,即便在没有预训练的情况下,LLMs相比现有最优技术仍具有显著优势。针对问题(1),我们分析了36个领域内分类器,涵盖LLaMA、Vicuna及其变体在预训练与微调状态下的表现,这些分类器基于九个涵盖广泛平台与讨论论坛的公开数据集。针对问题(2),我们评估了288个跨领域分类器在特定目标域数据集上的性能。针对问题(3),普通最小二乘分析表明,使用细粒度仇恨言论标签进行训练的优势在较小训练数据集中更为显著,但随数据集规模增大而减弱。最后,我们提出了仇恨言论检测的未来愿景,强调跨领域泛化能力与适当的基准测试实践。