Sociotechnical systems, such as language technologies, frequently exhibit identity-based biases. These biases exacerbate the experiences of historically marginalized communities and remain understudied in low-resource contexts. While models and datasets specific to a language or with multilingual support are commonly recommended to address these biases, this paper empirically tests the effectiveness of such approaches in the context of gender, religion, and nationality-based identities in Bengali, a widely spoken but low-resourced language. We conducted an algorithmic audit of sentiment analysis models built on mBERT and BanglaBERT, which were fine-tuned using all Bengali sentiment analysis (BSA) datasets from Google Dataset Search. Our analyses showed that BSA models exhibit biases across different identity categories despite having similar semantic content and structure. We also examined the inconsistencies and uncertainties arising from combining pre-trained models and datasets created by individuals from diverse demographic backgrounds. We connected these findings to the broader discussions on epistemic injustice, AI alignment, and methodological decisions in algorithmic audits.
翻译:社会技术系统(如语言技术)经常表现出基于身份的偏见。这些偏见加剧了历史上被边缘化群体的遭遇,但在低资源环境下仍缺乏研究。尽管针对特定语言或支持多语言的模型和数据集通常被推荐用于解决这些偏见,本文通过实证检验了这些方法在孟加拉语(一种使用广泛但资源匮乏的语言)中的有效性,重点关注基于性别、宗教和国籍的身份偏见。我们对使用mBERT和BanglaBERT构建的情感分析模型进行了算法审计,这些模型通过Google数据集搜索中的所有孟加拉语情感分析数据集进行了微调。分析表明,尽管语义内容和结构相似,孟加拉语情感分析模型在不同身份类别中仍表现出偏见。我们还考察了将预训练模型与来自不同人口背景个体创建的数据集相结合所产生的不一致性和不确定性。我们将这些发现与关于认知不正义、人工智能对齐以及算法审计中方法决策的更广泛讨论联系起来。