While colonization has sociohistorically impacted people's identities across various dimensions, those colonial values and biases continue to be perpetuated by sociotechnical systems. One category of sociotechnical systems--sentiment analysis tools--can also perpetuate colonial values and bias, yet less attention has been paid to how such tools may be complicit in perpetuating coloniality, although they are often used to guide various practices (e.g., content moderation). In this paper, we explore potential bias in sentiment analysis tools in the context of Bengali communities that have experienced and continue to experience the impacts of colonialism. Drawing on identity categories most impacted by colonialism amongst local Bengali communities, we focused our analytic attention on gender, religion, and nationality. We conducted an algorithmic audit of all sentiment analysis tools for Bengali, available on the Python package index (PyPI) and GitHub. Despite similar semantic content and structure, our analyses showed that in addition to inconsistencies in output from different tools, Bengali sentiment analysis tools exhibit bias between different identity categories and respond differently to different ways of identity expression. Connecting our findings with colonially shaped sociocultural structures of Bengali communities, we discuss the implications of downstream bias of sentiment analysis tools.
翻译:尽管殖民主义在社会历史层面对人们的身份认同产生了多维度影响,这些殖民价值观和偏见仍通过社会技术系统持续延续。作为社会技术系统的一个类别,情感分析工具同样可能延续殖民价值观与偏见,然而这类工具如何参与殖民性延续的问题,却较少受到学界关注——尽管它们常被用于指导内容审核等实践。本文以经历并持续承受殖民主义影响的孟加拉语社群为背景,探讨情感分析工具中潜在的偏见问题。基于孟加拉语本土社群中受殖民主义影响最深刻的身份类别,我们聚焦于性别、宗教和国籍三个维度。我们对Python包索引(PyPI)和GitHub上所有孟加拉语情感分析工具进行了算法审计。分析表明,尽管这些工具处理的语义内容和结构相似,但不同工具不仅输出结果不一致,还在不同身份类别间表现出偏见,且对身份表达的不同方式存在差异化响应。结合殖民主义塑造的孟加拉语社群社会文化结构,我们探讨了情感分析工具下游偏见的启示意义。