In recent years machine translation has become very successful for high-resource language pairs. This has also sparked new interest in research on the automatic translation of low-resource languages, including Indigenous languages. However, the latter are deeply related to the ethnic and cultural groups that speak (or used to speak) them. The data collection, modeling and deploying machine translation systems thus result in new ethical questions that must be addressed. Motivated by this, we first survey the existing literature on ethical considerations for the documentation, translation, and general natural language processing for Indigenous languages. Afterward, we conduct and analyze an interview study to shed light on the positions of community leaders, teachers, and language activists regarding ethical concerns for the automatic translation of their languages. Our results show that the inclusion, at different degrees, of native speakers and community members is vital to performing better and more ethical research on Indigenous languages.
翻译:近年来,机器翻译在高资源语言对任务中取得了显著成功,这也重新激发了人们对低资源语言(包括土著语言)自动翻译研究的兴趣。然而,土著语言与其使用(或曾使用)该语言的族群及文化群体密切相关,因此在数据收集、建模和部署机器翻译系统时,会产生必须应对的新型伦理问题。基于此,我们首先综述了关于土著语言记录、翻译及通用自然语言处理中伦理考量的现有文献。随后,我们开展并分析了一项访谈研究,旨在阐明社区领袖、教师及语言活动家对自动翻译其语言所涉伦理问题的立场。研究结果表明,在不同程度上将母语使用者与社区成员纳入研究,对于开展更优质且更符合伦理的土著语言研究至关重要。