This chapter focuses on gender-related errors in machine translation (MT) in the context of low-resource languages. We begin by explaining what low-resource languages are, examining the inseparable social and computational factors that create such linguistic hierarchies. We demonstrate through a case study of our mother tongue Bengali, a global language spoken by almost 300 million people but still classified as low-resource, how gender is assumed and inferred in translations to and from the high(est)-resource English when no such information is provided in source texts. We discuss the postcolonial and societal impacts of such errors leading to linguistic erasure and representational harms, and conclude by discussing potential solutions towards uplifting languages by providing them more agency in MT conversations.
翻译:本章聚焦于低资源语言情境下机器翻译中的性别相关错误。我们首先阐释低资源语言的定义,剖析造成此类语言层级体系的社会与计算因素的不可分割性。通过以我们的母语孟加拉语——一种拥有近3亿使用者却仍被归类为低资源语言的全球性语言——作为案例研究,论证在源文本未提供性别信息时,翻译至/译自最高资源语言英语的过程中如何产生性别假设与推断。我们探讨此类错误导致的语言消隐和表征伤害的后殖民与社会影响,最后通过讨论向机器翻译对话赋予语言更多能动性的解决方案,提出提升语言地位的潜在路径。