This paper compares three approaches to detecting incivility in Persian tweets: human qualitative coding, supervised learning with ParsBERT, and large language models (ChatGPT). Using 47,278 tweets from the #MahsaAmini movement in Iran, we evaluate the accuracy and efficiency of each method. ParsBERT substantially outperforms seven evaluated ChatGPT models in identifying hate speech. We also find that ChatGPT struggles not only with subtle cases but also with explicitly uncivil content, and that prompt language (English vs. Persian) does not meaningfully affect its outputs. The study provides a detailed comparison of these approaches and clarifies their strengths and limitations for analyzing hate speech in a low-resource language context.
翻译:本文比较了三种检测波斯语推文不文明现象的方法:人工定性编码、基于ParsBERT的监督学习以及大语言模型(ChatGPT)。通过使用伊朗#MahsaAmini运动中的47,278条推文,我们评估了每种方法的准确性和效率。在识别仇恨言论方面,ParsBERT显著优于七个经过评估的ChatGPT模型。我们还发现ChatGPT不仅在处理微妙案例时存在困难,对于明确的不文明内容也表现不佳,且提示语言(英语与波斯语)对其输出结果未产生实质性影响。本研究详细比较了这些方法,并阐明了它们在低资源语言环境下分析仇恨言论的优势与局限性。