Where and how language models (LMs) are deployed determines who can benefit from them. However, there are several challenges that prevent effective deployment of LMs in non-English-speaking and hardware constrained communities in the Global South. We call this challenge the last mile: the intersection of multilinguality and edge deployment, where the goals are aligned but the technical requirements often compete. Studying these two fields together is both a need, as linguistically diverse communities often face the most severe infrastructure constraints, and an opportunity, as edge and multilingual NLP research remain largely siloed. To understand the state of the art and the challenges of combining the two areas, we survey 232 papers that tackle this problem across the language modelling pipeline, from data collection to development and deployment. We also discuss open questions and provide actionable recommendations for different stakeholders in the NLP ecosystem. Finally, we hope that this work contributes to the development of inclusive and equitable language technologies.
翻译:语言模型(LM)的部署地点和方式决定了谁能够受益于它们。然而,在语言非英语且硬件受限的全球南方社区中,有效部署LM面临若干挑战。我们将这一挑战称为“最后一英里”:多语言性与边缘部署的交汇点,其目标虽然一致,但技术要求往往相互冲突。共同研究这两个领域既是必要的(因为语言多样性社区往往面临最严重的基础设施限制),也是机遇所在(因为边缘计算与多语言自然语言处理研究在很大程度上仍各自为政)。为理解当前技术现状及两者结合所面临的挑战,我们调研了232篇论文,这些论文从数据收集到开发与部署,全面覆盖语言建模流程。我们还探讨了未解问题,并为自然语言处理生态系统中的不同利益相关者提供了可操作建议。最后,我们希望这项工作能推动包容、公平的语言技术发展。