Hate speech relies heavily on cultural influences, leading to varying individual interpretations. For that reason, we propose a Semantic Componential Analysis (SCA) framework for a cross-cultural and cross-domain analysis of hate speech definitions. We create the first dataset of definitions derived from five domains: online dictionaries, research papers, Wikipedia articles, legislation, and online platforms, which are later analyzed into semantic components. Our analysis reveals that the components differ from definition to definition, yet many domains borrow definitions from one another without taking into account the target culture. We conduct zero-shot model experiments using our proposed dataset, employing three popular open-sourced LLMs to understand the impact of different definitions on hate speech detection. Our findings indicate that LLMs are sensitive to definitions: responses for hate speech detection change according to the complexity of definitions used in the prompt.
翻译:仇恨言论在很大程度上受文化影响,导致个体解读存在差异。为此,我们提出了一个语义成分分析框架,用于对仇恨言论定义进行跨文化和跨领域分析。我们创建了首个基于五个领域(在线词典、研究论文、维基百科条目、立法文件及网络平台)的定义数据集,并将其解析为语义成分。分析表明,不同定义间的成分存在差异,但许多领域在未考虑目标文化的情况下相互借用定义。我们使用所构建的数据集进行了零样本模型实验,采用三种主流开源大语言模型,以探究不同定义对仇恨言论检测的影响。研究结果表明,大语言模型对定义具有敏感性:其仇恨言论检测的响应会随提示中所用定义的复杂程度而变化。