The Upper Bound of Information Diffusion in Code Review

Background: Code review, the discussion around a code change among humans, forms a communication network that enables its participants to exchange and spread information. Although reported by qualitative studies, our understanding of the capability of code review as a communication network is still limited. Objective: In this article, we report on a first step towards evaluating the capability of code review as a communication network by quantifying how fast and how far information can spread through code review: the upper bound of information diffusion in code review. Method: In an in-silico experiment, we simulate an artificial information diffusion within large (Microsoft), mid-sized (Spotify), and small code review systems (Trivago) modelled as communication networks. We then measure the minimal topological and temporal distances between the participants to quantify how far and how fast information can spread in code review. Results: An average code review participants in the small and mid-sized code review systems can spread information to between 72% and 85% of all code review participants within four weeks independently of network size and tooling; for the large code review systems, we found an absolute boundary of about 11000 reachable participants. On average (median), information can spread between two participants in code review in less than five hops and less than five days. Conclusion: We found evidence that the communication network emerging from code review scales well and spreads information fast and broadly, corroborating the findings of prior qualitative work. The study lays the foundation for understanding and improving code review as a communication network.

翻译：背景：代码评审，即围绕代码变更在人与人之间展开的讨论，构成了一个通信网络，使其参与者能够交换和传播信息。尽管定性研究已有报道，但我们对于代码评审作为通信网络的能力理解仍然有限。目的：本文旨在通过量化信息在代码评审中传播的速度与广度——即代码评审中信息传播的上界，来评估代码评审作为通信网络的能力，这是该方向研究的初步探索。方法：在一项计算机模拟实验中，我们在被建模为通信网络的大型（微软）、中型（Spotify）及小型（Trivago）代码评审系统中模拟了人工信息扩散过程。随后，我们通过测量参与者之间最小的拓扑距离与时间距离，来量化信息在代码评审中能够传播多远、多快。结果：在小型与中型代码评审系统中，平均每位参与者能够在四周内将信息传播至全部参与者的72%至85%，这一比例与网络规模及所用工具无关；对于大型代码评审系统，我们发现存在约11000名可达参与者的绝对边界。平均而言（中位数），信息在代码评审中任意两位参与者之间的传播距离少于五跳，时间少于五天。结论：我们发现了证据，表明由代码评审形成的通信网络具有良好的可扩展性，并能快速、广泛地传播信息，这证实了先前定性研究的发现。本研究为理解并改进代码评审作为通信网络奠定了基础。