What Makes a Code Review Useful to OpenDev Developers? An Empirical Investigation

Context: Due to the association of significant efforts, even a minor improvement in the effectiveness of Code Reviews(CR) can incur significant savings for a software development organization. Aim: This study aims to develop a finer grain understanding of what makes a code review comment useful to OSS developers, to what extent a code review comment is considered useful to them, and how various contextual and participant-related factors influence its usefulness level. Method: On this goal, we have conducted a three-stage mixed-method study. We randomly selected 2,500 CR comments from the OpenDev Nova project and manually categorized the comments. We designed a survey of OpenDev developers to better understand their perspectives on useful CRs. Combining our survey-obtained scores with our manually labeled dataset, we trained two regression models - one to identify factors that influence the usefulness of CR comments and the other to identify factors that improve the odds of `Functional' defect identification over the others. Key findings: The results of our study suggest that a CR comment's usefulness is dictated not only by its technical contributions such as defect findings or quality improvement tips but also by its linguistic characteristics such as comprehensibility and politeness. While a reviewer's coding experience positively associates with CR usefulness, the number of mutual reviews, comment volume in a file, the total number of lines added /modified, and CR interval has the opposite associations. While authorship and reviewership experiences for the files under review have been the most popular attributes for reviewer recommendation systems, we do not find any significant association of those attributes with CR usefulness.

翻译：背景：由于代码评审（CR）耗费大量精力，即便是其有效性的微小改进，也能为软件开发组织带来显著的成本节约。目标：本研究旨在更细致地理解：哪些因素使代码评审评论对开源软件（OSS）开发者有用；代码评审评论在多大程度上被认为有用；以及各种上下文和参与者相关因素如何影响其有用性水平。方法：为此，我们进行了一项三阶段混合方法研究。我们随机选取了OpenDev Nova项目中的2,500条CR评论，并手动对其分类。我们设计了一项针对OpenDev开发者的调查，以更好地理解他们对有用CR的看法。结合调查获得的评分与手动标注的数据集，我们训练了两个回归模型——一个用于识别影响CR评论有用性的因素，另一个用于识别提升"功能性"缺陷识别概率（相对于其他类型）的因素。主要发现：研究结果表明，CR评论的有用性不仅取决于其技术贡献（如缺陷发现或质量改进建议），还取决于其语言特征（如可理解性和礼貌性）。虽然评审者的编码经验与CR有用性呈正相关，但相互评审次数、文件内评论数量、新增/修改代码行总数以及CR间隔时间则呈负相关。虽然对审查文件的作者经验与评审者经验是评审者推荐系统中最常用的属性，但我们未发现这些属性与CR有用性之间存在显著关联。