Interference is a ubiquitous problem in experiments conducted on two-sided content marketplaces, such as Douyin (China's analog of TikTok). In many cases, creators are the natural unit of experimentation, but creators interfere with each other through competition for viewers' limited time and attention. "Naive" estimators currently used in practice simply ignore the interference, but in doing so incur bias on the order of the treatment effect. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, are impractically high variance. We introduce a novel Monte-Carlo estimator, based on "Differences-in-Qs" (DQ) techniques, which achieves bias that is second-order in the treatment effect, while remaining sample-efficient to estimate. On the theoretical side, our contribution is to develop a generalized theory of Taylor expansions for policy evaluation, which extends DQ theory to all major MDP formulations. On the practical side, we implement our estimator on Douyin's experimentation platform, and in the process develop DQ into a truly "plug-and-play" estimator for interference in real-world settings: one which provides robust, low-bias, low-variance treatment effect estimates; admits computationally cheap, asymptotically exact uncertainty quantification; and reduces MSE by 99\% compared to the best existing alternatives in our applications.
翻译:干扰是双边内容市场(如抖音)实验中普遍存在的问题。在许多情况下,创作者是实验的自然单元,但创作者之间会通过争夺用户有限的时间和注意力而产生相互干扰。当前实践中使用的"朴素"估计量直接忽略干扰,但因此会产生与处理效应同量级的偏差。我们将此类实验中的推断问题形式化为策略评估问题。离策略估计量虽无偏但方差过高且不实用。我们提出了一种基于"差分Q值"(DQ)技术的新型蒙特卡洛估计量,其偏差仅为处理效应的二阶量,同时保持样本估计效率。理论层面,我们的贡献在于发展了策略评估的泰勒展开广义理论,将DQ理论扩展到所有主流马尔可夫决策过程(MDP)形式。实践层面,我们在抖音实验平台实现了该估计量,并将DQ发展为真正适用于现实场景的"即插即用"干扰估计工具:它提供稳健、低偏差、低方差的处理效应估计,支持计算廉价且渐近精确的不确定性量化,并在我们的应用中相比现有最优方法将均方误差(MSE)降低99%。