The Upper Bound of Information Diffusion in Code Review

Background: Code review, the discussion around a code change among humans, forms a communication network that enables its participants to exchange and spread information. Although reported by qualitative studies, our understanding of the capability of code review as a communication network is still limited. Objective: In this article, we report on a first step towards evaluating the capability of code review as a communication network by quantifying how fast and how far information can spread through code review: the upper bound of information diffusion in code review. Method: In an in-silico experiment, we simulate an artificial information diffusion within large (Microsoft), mid-sized (Spotify), and small code review systems (Trivago) modelled as communication networks. We then measure the minimal topological and temporal distances between the participants to quantify how far and how fast information can spread in code review. Results: An average code review participants in the small and mid-sized code review systems can spread information to between 72% and 85% of all code review participants within four weeks independently of network size and tooling; for the large code review systems, we found an absolute boundary of about 11000 reachable participants. On average (median), information can spread between two participants in code review in less than five hops and less than five days. Conclusion: We found evidence that the communication network emerging from code review scales well and spreads information fast and broadly, corroborating the findings of prior qualitative work. The study lays the foundation for understanding and improving code review as a communication network.

翻译：背景：代码审查，即围绕代码变更进行的人类讨论，形成了一个使参与者能够交换和传播信息的通信网络。尽管已有定性研究报告了这一点，但我们对于代码审查作为通信网络的能力理解仍然有限。目标：本文通过量化信息在代码审查中传播的速度和范围，即信息扩散的上界，迈出了评估代码审查作为通信网络能力的第一步。方法：在计算机模拟实验中，我们在大型（微软）、中型（Spotify）和小型（Trivago）代码审查系统（建模为通信网络）中模拟人工信息扩散。然后测量参与者之间的最小拓扑距离和时间距离，以量化信息在代码审查中传播的范围和速度。结果：在小型和中型代码审查系统中，平均而言，无论网络规模和使用的工具如何，一个参与者可在四周内向72%至85%的代码审查参与者扩散信息；对于大型代码审查系统，我们发现绝对边界约为11000名可到达参与者。中位数情况下，信息可在少于五跳和少于五天的时间内在代码审查的两名参与者之间传播。结论：我们发现证据表明，代码审查中涌现的通信网络具有良好的扩展性，能够快速且广泛地传播信息，这印证了此前定性研究的发现。本研究为理解和改进代码审查作为通信网络奠定了基础。