Linguistic diversity is a human attribute which, with the advance of generative AIs, is coming under threat. This paper, based on the contributions of sociolinguistics, examines the consequences of the variety selection bias imposed by technological applications and the vicious circle of preserving a variety that becomes dominant and standardized because it has linguistic documentation to feed the large language models for machine learning.
翻译:语言多样性是人类的一种属性,随着生成式人工智能的进步,这一属性正面临威胁。本文基于社会语言学的贡献,探讨了技术应用所施加的变体选择偏见所带来的后果,以及由此形成的恶性循环:由于拥有可供机器学习大语言模型训练的语言文档,某一语言变体逐渐成为主导和标准化变体,并因此得以持续保存。