Using Knowledge Units of Programming Languages to Recommend Reviewers for Pull Requests: An Empirical Study

Code review is a key element of quality assurance in software development. Determining the right reviewer for a given code change requires understanding the characteristics of the changed code, identifying the skills of each potential reviewer (expertise profile), and finding a good match between the two. To facilitate this task, we design a code reviewer recommender that operates on the knowledge units (KUs) of a programming language. We define a KU as a cohesive set of key capabilities that are offered by one or more building blocks of a given programming language. We operationalize our KUs using certification exams for the Java programming language. We detect KUs from 10 actively maintained Java projects from GitHub, spanning 290K commits and 65K pull requests (PRs). Next, we generate developer expertise profiles based on the detected KUs. Finally, these KU-based expertise profiles are used to build a code reviewer recommender (KUREC). In RQ1, we observe that KUREC performs as well as the top-performing baseline recommender (RF). From a practical standpoint, we highlight that KUREC's performance is more stable (lower interquartile range) than that of RF, thus making it more consistent and potentially more trustworthy. Next, in RQ2 we design three new recommenders by combining KUREC with our baseline recommenders. These new combined recommenders outperform both KUREC and the individual baselines. Finally, in RQ3 we evaluate how reasonable the recommendations from KUREC and the combined recommenders are when those deviate from the ground truth. Taking together the results from all RQs, we conclude that KUREC and one of the combined recommenders (AD_FREQ) are overall superior to the baseline recommenders that we studied. Future work in the area should thus (i) consider KU-based recommenders as baselines and (ii) experiment with combined recommenders.

翻译：代码审查是软件质量保证的关键环节。为特定代码变更确定合适的审阅者，需要理解变更代码的特征、识别每位潜在审阅者的技能（专长画像），并在二者之间找到最佳匹配。为促进这一任务，我们设计了一种基于编程语言知识单元（KUs）的代码审阅者推荐系统。我们将知识单元定义为某一编程语言的一个或多个构建模块所提供的一组关键能力。我们利用Java编程语言的认证考试来具体化知识单元的操作定义。从GitHub上10个积极维护的Java项目中检测知识单元，涉及29万次提交和6.5万个拉取请求（PR）。随后，基于检测到的知识单元生成开发者专长画像。最后，利用这些基于知识单元的专长画像构建代码审阅者推荐系统KUREC。在研究问题1中，我们发现KUREC的性能与最优基线推荐系统（RF）相当。从实践角度，我们强调KUREC的性能比RF更稳定（四分位距更小），因此更具一致性和潜在可信度。在研究问题2中，我们通过将KUREC与基线推荐系统结合，设计了三种新型推荐系统。这些组合推荐系统在性能上优于KUREC和单个基线系统。最后，在研究问题3中，我们评估了KUREC及组合推荐系统在推荐结果偏离真实情况时的合理性。综合所有研究问题的结果，我们得出结论：KUREC与组合推荐系统AD_FREQ总体上优于我们研究的基线推荐系统。因此，该领域的未来工作应当：（i）将基于知识单元的推荐系统作为基线方法；（ii）开展组合推荐系统的实验研究。