Context: Modern software systems (e.g., Apache Spark) are usually written in multiple programming languages (PLs). There is little understanding on the phenomenon of multi-programming-language commits (MPLCs), which involve modified source files written in multiple PLs. Objective: This work aims to explore MPLCs and their impacts on development difficulty and software quality. Methods: We performed an empirical study on eighteen non-trivial Apache projects with 197,566 commits. Results: (1) the most commonly used PL combination consists of all the four PLs, i.e., C/C++, Java, JavaScript, and Python; (2) 9% of the commits from all the projects are MPLCs, and the proportion of MPLCs in 83% of the projects goes to a relatively stable level; (3) more than 90% of the MPLCs from all the projects involve source files in two PLs; (4) the change complexity of MPLCs is significantly higher than that of non-MPLCs; (5) issues fixed in MPLCs take significantly longer to be resolved than issues fixed in non-MPLCs in 89% of the projects; (6) MPLCs do not show significant effects on issue reopen; (7) source files undergoing MPLCs tend to be more bug-prone; and (8) MPLCs introduce more bugs than non-MPLCs. Conclusions: MPLCs are related to increased development difficulty and decreased software quality.
翻译:背景:现代软件系统(如Apache Spark)通常采用多种编程语言(PL)编写。目前对涉及多种编程语言修改的多编程语言提交(MPLC)现象尚缺乏深入理解。目的:本研究旨在探索MPLC及其对开发难度和软件质量的影响。方法:我们对18个规模较大的Apache项目进行了实证研究,涵盖197,566次提交。结果:(1)最常用的PL组合包含全部四种语言,即C/C++、Java、JavaScript和Python;(2)所有项目中9%的提交属于MPLC,且83%项目中MPLC的比例趋于相对稳定水平;(3)超过90%的MPLC涉及两种PL的源文件;(4)MPLC的变更复杂度显著高于非MPLC;(5)89%的项目中,MPLC修复的问题所需解决时间显著长于非MPLC修复的问题;(6)MPLC对问题重新开启未表现出显著影响;(7)经历MPLC的源文件往往具有更高的缺陷倾向性;(8)MPLC引入的缺陷数量多于非MPLC。结论:MPLC与开发难度增加及软件质量下降密切相关。