Hate speech is a severe issue that affects many online platforms. So far, several studies have been performed to develop robust hate speech detection systems. Large language models like ChatGPT have recently shown a great promise in performing several tasks, including hate speech detection. However, it is crucial to comprehend the limitations of these models to build robust hate speech detection systems. To bridge this gap, our study aims to evaluate the strengths and weaknesses of the ChatGPT model in detecting hate speech at a granular level across 11 languages. Our evaluation employs a series of functionality tests that reveals various intricate failures of the model which the aggregate metrics like macro F1 or accuracy are not able to unfold. In addition, we investigate the influence of complex emotions, such as the use of emojis in hate speech, on the performance of the ChatGPT model. Our analysis highlights the shortcomings of the generative models in detecting certain types of hate speech and highlighting the need for further research and improvements in the workings of these models.
翻译:仇恨言论是影响众多在线平台的严重问题。迄今为止,已有若干研究致力于开发稳健的仇恨言论检测系统。像ChatGPT这样的大型语言模型近期在执行包括仇恨言论检测在内的多项任务时展现出巨大潜力。然而,为构建稳健的仇恨言论检测系统,理解这些模型的局限性至关重要。为弥补这一不足,本研究旨在细致评估ChatGPT模型在11种语言中检测仇恨言论的优势与劣势。我们的评估采用一系列功能测试,揭示了该模型在宏观F1分数或准确率等聚合指标无法体现的多种微妙故障。此外,我们考察了复杂情绪(例如在仇恨言论中使用表情符号)对ChatGPT模型性能的影响。我们的分析凸显了生成模型在检测特定类型仇恨言论方面的缺陷,并强调了进一步研究及改进这些模型工作机制的必要性。