Recent studies have alarmed that many online hate speeches are implicit. With its subtle nature, the explainability of the detection of such hateful speech has been a challenging problem. In this work, we examine whether ChatGPT can be used for providing natural language explanations (NLEs) for implicit hateful speech detection. We design our prompt to elicit concise ChatGPT-generated NLEs and conduct user studies to evaluate their qualities by comparison with human-written NLEs. We discuss the potential and limitations of ChatGPT in the context of implicit hateful speech research.
翻译:近期研究警示,网络空间中大量仇恨言论具有隐式特征。鉴于其隐晦特性,此类仇恨言论检测的可解释性已成为具有挑战性的研究课题。本文探究ChatGPT能否为隐式仇恨言论检测提供自然语言解释。我们设计了引导性提示词以生成简洁的ChatGPT解释文本,并通过用户研究将其与人工撰写的解释文本进行质量对比。最后,我们讨论了ChatGPT在隐式仇恨言论研究中的潜力与局限性。