Artificial intelligence (AI) has the potential to transform healthcare, education, governance and socioeconomic equity, but its benefits remain concentrated in a small number of languages (Bender, 2019; Blasi et al., 2022; Joshi et al., 2020; Ranathunga and de Silva, 2022; Young, 2015). Language AI - the technologies that underpin widely-used conversational systems such as ChatGPT - could provide major benefits if available in people's native languages, yet most of the world's 7,000+ linguistic communities currently lack access and face persistent digital marginalization. Here we present a global longitudinal analysis of social, economic and infrastructural conditions across languages to assess systemic inequalities in language AI. We first analyze the existence of AI resources for 6003 languages. We find that despite efforts of the community to broaden the reach of language technologies (Bapna et al., 2022; Costa-Jussà et al., 2022), the dominance of a handful of languages is exacerbating disparities on an unprecedented scale, with divides widening exponentially rather than narrowing. Further, we contrast the longitudinal diffusion of AI with that of earlier IT technologies, revealing a distinctive hype-driven pattern of spread. To translate our findings into practical insights and guide prioritization efforts, we introduce the Language AI Readiness Index (EQUATE), which maps the state of technological, socio-economic, and infrastructural prerequisites for AI deployment across languages. The index highlights communities where capacity exists but remains underutilized, and provides a framework for accelerating more equitable diffusion of language AI. Our work contributes to setting the baseline for a transition towards more sustainable and equitable language technologies.
翻译:人工智能(AI)有潜力变革医疗保健、教育、治理和社会经济公平,但其益处目前仍集中在少数语言之中(Bender, 2019; Blasi et al., 2022; Joshi et al., 2020; Ranathunga and de Silva, 2022; Young, 2015)。语言AI——即支撑ChatGPT等广泛应用对话系统的技术——若能以人们的母语提供,可能带来重大效益。然而,全球7000多个语言社群中的大多数目前仍无法获取这些技术,并面临持续的数字边缘化。本文通过对不同语言的社会、经济和基础设施条件进行全球纵向分析,以评估语言AI领域的系统性不平等。我们首先分析了6003种语言的AI资源现状。研究发现,尽管学界努力扩大语言技术的覆盖范围(Bapna et al., 2022; Costa-Jussà et al., 2022),但少数语言的主导地位正在以前所未有的规模加剧不平等,语言间的鸿沟呈指数级扩大而非缩小。此外,我们对比了AI与早期IT技术的纵向扩散模式,揭示了一种独特的由炒作驱动的传播规律。为了将研究发现转化为实践见解并指导优先发展工作,我们引入了语言AI就绪指数(EQUATE),该指数描绘了不同语言在AI部署所需的技术、社会经济和基础设施先决条件方面的现状。该指数突显了那些具备能力但尚未充分利用的社群,并为加速语言AI更公平的扩散提供了框架。我们的工作为向更可持续、更公平的语言技术转型奠定了基线。