Measuring causal effects in networked two-sided marketplaces is challenging due to treatment interference between market participants on different sides. When treatment is applied to one side (e.g., job seekers), their interactions with the other side (e.g., job posters) introduce spillover effects that violate the Stable Unit Treatment Value Assumption (SUTVA) and bias causal estimates. While cluster-based randomization mitigates this problem, prior approaches struggle with a fundamental trade-off: reducing spillover requires isolated clusters that will reduce the number of qualifying clusters, which decreases statistical power. This paper introduces EgoCluster V3, an iterative clustering algorithm that reduces spillover by 3x compared to prior versions while preserving node coverage and doubling test power. We further introduce MultiEgoCluster, which extends V3 through a two-stage procedure that first groups highly connected egos into multi-ego clusters before applying the iterative clustering algorithm. This achieves an additional ~56% spillover reduction and ~38% increase in sample size. Both methods are deployed in production at LinkedIn and have systematically enabled high-impact two-sided marketplace experiments. Since residual bias cannot be fully eliminated through clustering alone, we derive a theoretical bias correction method for average treatment effect (ATE) estimation based on graph structure and propose an approach to generalize results to the general population.
翻译:在具有网络结构的双边市场中,由于不同侧市场参与者之间的实验处理干扰,衡量因果效应面临挑战。当对其中一侧(如求职者)施加实验处理时,其与另一侧(如招聘方)的交互会引入溢出效应,违反稳定单元处理值假设(SUTVA)并导致因果估计偏差。虽然基于聚类的随机化能缓解该问题,但现有方法难以平衡根本性权衡:减少溢出需要隔离性强的聚类,但这会减少合格聚类数量,从而降低统计功效。本文提出EgoCluster V3,一种迭代聚类算法,与前代版本相比可将溢出效应降低3倍,同时保持节点覆盖率并提升两倍统计效力。我们进一步提出MultiEgoCluster,通过两阶段流程扩展V3:先通过高度连接的自我节点构建多自我聚类,再应用迭代聚类算法。该方法额外实现约56%的溢出效应降低和38%的样本量增加。两种方法已在LinkedIn生产环境中部署,系统性地支撑了高影响力的双边市场实验。由于聚类方法无法完全消除残余偏差,我们推导了基于图结构的平均处理效应(ATE)理论偏差校正方法,并提出将结果推广至总体的通用方案。