The open-source software (OSS) community has historically been dominated by English as the primary language for code, documentation, and developer interactions. However, with growing global participation and better support for non-Latin scripts through standards like Unicode, OSS is gradually becoming more multilingual. This study investigates the extent to which OSS is becoming more multilingual, analyzing 9.14 billion GitHub issues, pull requests, and discussions, and 62,500 repositories across five programming languages and 30 natural languages, covering the period from 2015 to 2025. We examine six research questions to track changes in language use across communication, code, and documentation. We find that multilingual participation has steadily increased, especially in Korean, Chinese, and Russian. This growth appears not only in issues and discussions but also in code comments, string literals, and documentation files. While this shift reflects greater inclusivity and language diversity in OSS, it also creates language tension. The ability to express oneself in a native language can clash with shared norms around English use, especially in collaborative settings. Non-English or multilingual projects tend to receive less visibility and participation, suggesting that language remains both a resource and a barrier, shaping who gets heard, who contributes, and how open collaboration unfolds.
翻译:开源软件(OSS)社区历来以英语为主导语言,广泛应用于代码、文档和开发者交流中。然而,随着全球参与度的提高以及Unicode等标准对非拉丁文字支持能力的增强,OSS正逐步走向多语言化。本研究通过分析2015年至2025年间91.4亿条GitHub议题、拉取请求和讨论内容,以及涵盖五种编程语言和30种自然语言的62,500个代码仓库,深入探究OSS多语言化的发展程度。我们围绕六个研究问题,追踪交流、代码和文档三个维度中语言使用的变迁。研究发现,多语言参与度持续增长,尤其在韩语、汉语和俄语社区表现显著。这种增长不仅体现在议题讨论中,也反映在代码注释、字符串字面量和文档文件中。虽然这种转变彰显了OSS社区更强的包容性与语言多样性,但也引发了语言张力。使用母语表达的能力可能与既有的英语使用规范产生冲突,在协作场景中尤为明显。非英语或多语言项目往往获得更少的关注度和参与度,这表明语言依然兼具资源与屏障的双重属性,深刻影响着话语权归属、贡献者构成以及开放协作的实践形态。