Bipartite Experiments are randomized experiments where the treatment is applied to a set of units (randomization units) that is different from the units of analysis, and randomization units and analysis units are connected through a bipartite graph. The scale of experimentation at large online platforms necessitates both accurate inference in the presence of a large bipartite interference graph, as well as a highly scalable implementation. In this paper, we describe new methods for inference that enable practical, scalable analysis of bipartite experiments: (1) We propose CA-ERL, a covariate-adjusted variant of the exposure-reweighted-linear (ERL) estimator [9], which empirically yields 60-90% variance reduction. (2) We introduce a randomization-based method for inference and prove asymptotic validity of a Wald-type confidence interval under graph sparsity assumptions. (3) We present a linear-time algorithm for randomization inference of the CA-ERL estimator, which can be easily implemented in query engines like Presto or Spark. We evaluate our methods both on a real experiment at Meta that randomized treatment on Facebook Groups and analyzed user-level metrics, as well as simulations on synthetic data. The real-world data shows that our CA-ERL estimator reduces the confidence interval (CI) width by 60-90% (compared to ERL) in a practical setting. The simulations using synthetic data show that our randomization inference procedure achieves correct coverage across instances, while the ERL estimator has incorrectly small CI widths for instances with large true effect sizes and is overly conservative when the bipartite graph is dense.
翻译:二部实验是一种随机化实验,其中处理应用于一组单元(随机化单元),这些单元与分析单元不同,且随机化单元与分析单元通过二部图连接。大型在线平台的实验规模要求既能在存在大型二部干扰图的情况下进行准确推断,又需实现高度可扩展的实施方案。本文描述了支持二部实验实用、可扩展分析的新推断方法:(1)我们提出CA-ERL,即暴露加权线性(ERL)估计量[9]的协变量调整变体,实证表明其可实现60-90%的方差缩减。(2)我们引入一种基于随机化的推断方法,并在图稀疏性假设下证明了Wald型置信区间的渐近有效性。(3)我们提出CA-ERL估计量的线性时间随机化推断算法,该算法可轻松在Presto或Spark等查询引擎中实现。我们通过Meta在Facebook群组上随机化处理并分析用户级指标的真实实验,以及合成数据模拟,评估了所提方法。真实数据表明,在实践中CA-ERL估计量将置信区间宽度缩减60-90%(相比ERL)。合成数据模拟显示,所提随机化推断程序在各实例中均能达到正确覆盖,而ERL估计量在真实效应量较大的实例中置信区间宽度过窄,且在二部图密集时过于保守。