While existing work on studying bias in NLP focues on negative or pejorative language use, Govindarajan et al. (2023) offer a revised framing of bias in terms of intergroup social context, and its effects on language behavior. In this paper, we investigate if two pragmatic features (specificity and affect) systematically vary in different intergroup contexts -- thus connecting this new framing of bias to language output. Preliminary analysis finds modest correlations between specificity and affect of tweets with supervised intergroup relationship (IGR) labels. Counterfactual probing further reveals that while neural models finetuned for predicting IGR labels reliably use affect in classification, the model's usage of specificity is inconclusive. Code and data can be found at: https://github.com/venkatasg/intergroup-probing
翻译:虽然现有关于自然语言处理中偏见的研究主要关注负面或贬义语言使用,Govindarajan等人(2023)提出了基于群体间社会语境及其对语言行为影响的偏见修正框架。本文探究两个语用特征(具体性与情感)是否在不同群体间语境中存在系统性差异——从而将这种新的偏见框架与语言输出相关联。初步分析发现,推文的具体性与情感特征与监督式群体间关系标签存在适度相关性。反事实探针进一步揭示,虽然用于预测群体间关系标签的微调神经模型在分类中可靠地利用了情感特征,但模型对具体性的使用尚无定论。代码与数据见:https://github.com/venkatasg/intergroup-probing