Towards Reliable Social A/B Testing: Spillover-Contained Clustering with Robust Post-Experiment Analysis

A/B testing is the foundation of decision-making in online platforms, yet social products often suffer from network interference: user interactions cause treatment effects to spill over into the control group. Such spillovers bias causal estimates and undermine experimental conclusions. Existing approaches face key limitations: user-level randomization ignores network structure, while cluster-based methods often rely on general-purpose clustering that is not tailored for spillover containment and has difficulty balancing unbiasedness and statistical power at scale. We propose a spillover-contained experimentation framework with two stages. In the pre-experiment stage, we build social interaction graphs and introduce a Balanced Louvain algorithm that produces stable, size-balanced clusters while minimizing cross-cluster edges, enabling reliable cluster-based randomization. In the post-experiment stage, we develop a tailored CUPAC estimator that leverages pre-experiment behavioral covariates to reduce the variance induced by cluster-level assignment, thereby improving statistical power. Together, these components provide both structural spillover containment and robust statistical inference. We validate our approach through large-scale social sharing experiments on Kuaishou, a platform serving hundreds of millions of users. Results show that our method substantially reduces spillover and yields more accurate assessments of social strategies than traditional user-level designs, establishing a reliable and scalable framework for networked A/B testing.

翻译：A/B测试是在线平台决策制定的基础，然而社交产品常受网络干扰影响：用户交互导致处理效应溢出至对照组。此类溢出效应会扭曲因果估计并削弱实验结论的有效性。现有方法存在关键局限：用户级随机化忽略网络结构，而基于聚类的方法通常依赖通用聚类算法，这些算法未针对溢出抑制进行专门设计，且难以在大规模场景下平衡无偏性与统计功效。我们提出一个包含两个阶段的溢出抑制实验框架。在实验前阶段，我们构建社交交互图并引入平衡Louvain算法，该算法在最小化跨聚类边连接的同时生成稳定且规模均衡的聚类，从而实现可靠的基于聚类的随机化。在实验后阶段，我们开发了定制化的CUPAC估计器，该估计器利用实验前行为协变量来降低由聚类级分配引起的方差，从而提升统计功效。这些组件共同提供了结构化的溢出抑制与鲁棒的统计推断能力。我们通过在服务数亿用户的快手平台上进行大规模社交分享实验来验证所提方法。结果表明，相较于传统的用户级设计方案，我们的方法能显著降低溢出效应，并对社交策略产生更精准的评估，从而为网络化A/B测试建立了可靠且可扩展的框架。