The code review team at Meta is continuously improving the code review process. To evaluate the new recommenders, we conduct three A/B tests which are a type of randomized controlled experimental trial. Expt 1. We developed a new recommender based on features that had been successfully used in the literature and that could be calculated with low latency. In an A/B test on 82k diffs in Spring of 2022, we found that the new recommender was more accurate and had lower latency. Expt 2. Reviewer workload is not evenly distributed, our goal was to reduce the workload of top reviewers. We then ran an A/B test on 28k diff authors in Winter 2023 on a workload balanced recommender. Our A/B test led to mixed results. Expt 3. We suspected the bystander effect might be slowing down reviews of diffs where only a team was assigned. We conducted an A/B test on 12.5k authors in Spring 2023 and found a large decrease in the amount of time it took for diffs to be reviewed when a recommended individual was explicitly assigned. Our findings also suggest there can be a discrepancy between historical back-testing and A/B test experimental findings.
翻译:Meta公司的代码审查团队持续改进代码审查流程。为评估新的推荐系统,我们进行了三项A/B测试——一种随机对照实验。实验一:我们基于文献中已成功应用且可低延迟计算的特征,开发了新型推荐器。在2022年春季对8.2万次代码变更(diffs)的A/B测试中,发现新推荐器具有更高准确性和更低延迟。实验二:审查者工作量分布不均,我们的目标是减轻顶级审查者的负荷。随后在2023年冬季对2.8万次代码变更作者进行了工作量均衡推荐器的A/B测试,结果呈现混合效应。实验三:我们怀疑当仅分配团队审查时,旁观者效应可能延缓审查进度。2023年春季对1.25万作者进行的A/B测试表明,当明确分配推荐个体审查时,代码变更的审查时间大幅缩短。我们的研究还发现,历史回测结果与A/B测试实验结果之间可能存在差异。