This chapter focuses on gender-related errors in machine translation (MT) in the context of low-resource languages. We begin by explaining what low-resource languages are, examining the inseparable social and computational factors that create such linguistic hierarchies. We demonstrate through a case study of our mother tongue Bengali, a global language spoken by almost 300 million people but still classified as low-resource, how gender is assumed and inferred in translations to and from the high(est)-resource English when no such information is provided in source texts. We discuss the postcolonial and societal impacts of such errors leading to linguistic erasure and representational harms, and conclude by discussing potential solutions towards uplifting languages by providing them more agency in MT conversations.
翻译:本章聚焦于低资源语言背景下机器翻译中的性别相关错误。我们首先阐释低资源语言的定义,剖析造成此类语言层级结构的不可分割的社会与计算因素。通过以我们的母语孟加拉语(一种全球近3亿人使用但仍被归类为低资源语言的通用语)为案例研究,我们展示了当源文本未提供性别信息时,在与最高资源语言英语互译过程中如何被假定和推断性别。我们探讨了此类错误导致的语言抹除与表征伤害的后殖民与社会影响,并最后讨论了通过提升语言在机器翻译对话中的自主性来推动语言发展的潜在解决方案。