While existing work on studying bias in NLP focues on negative or pejorative language use, Govindarajan et al. (2023) offer a revised framing of bias in terms of intergroup social context, and its effects on language behavior. In this paper, we investigate if two pragmatic features (specificity and affect) systematically vary in different intergroup contexts -- thus connecting this new framing of bias to language output. Preliminary analysis finds modest correlations between specificity and affect of tweets with supervised intergroup relationship (IGR) labels. Counterfactual probing further reveals that while neural models finetuned for predicting IGR labels reliably use affect in classification, the model's usage of specificity is inconclusive. Code and data can be found at: https://github.com/venkatasg/intergroup-probing
翻译:尽管现有关于自然语言处理中偏见的研究主要关注负面或贬损性语言使用,Govindarajan等人(2023)提出了基于群际社会语境及其对语言行为影响的偏见修正框架。本文探究两个语用特征(具体性与情感)是否在不同群际语境中呈现系统性差异——从而将这一新的偏见框架与语言输出相联系。初步分析发现,推文的具体性与情感特征与监督式群际关系标签之间存在中等相关性。反事实探测进一步揭示:虽然为预测群际关系标签而微调的神经模型在分类中可靠地利用了情感特征,但模型对具体性特征的运用尚不明确。代码与数据见:https://github.com/venkatasg/intergroup-probing