Biases induced to text by generative models have become an increasingly large topic in recent years. In this paper we explore how machine translation might introduce a bias in sentiments as classified by sentiment analysis models. For this, we compare three open access machine translation models for five different languages on two parallel corpora to test if the translation process causes a shift in sentiment classes recognized in the texts. Though our statistic test indicate shifts in the label probability distributions, we find none that appears consistent enough to assume a bias induced by the translation process.
翻译:近年来,生成模型对文本引入的偏见已成为日益重要的研究课题。本文探究机器翻译如何导致情感分析模型分类结果中的偏见。为此,我们基于两个平行语料库,比较了五种语言的三款开源机器翻译模型,以检验翻译过程是否会导致文本中识别的情感类别发生偏移。尽管统计检验显示标签概率分布存在偏移,但我们未发现任何足够一致的结果,足以证明翻译过程引入了系统性偏见。