Estimating the effects of long-term treatments through A/B testing is challenging. Treatments, such as updates to product functionalities, user interface designs, and recommendation algorithms, are intended to persist within the system for a long duration of time after their initial launches. However, due to the constraints of conducting long-term experiments, practitioners often rely on short-term experimental results to make product launch decisions. It remains open how to accurately estimate the effects of long-term treatments using short-term experimental data. To address this question, we introduce a longitudinal surrogate framework that decomposes the long-term effects into functions based on user attributes, short-term metrics, and treatment assignments. We outline identification assumptions, estimation strategies, inferential techniques, and validation methods under this framework. Empirically, we demonstrate that our approach outperforms existing solutions by using data from two real-world experiments, each involving more than a million users on WeChat, one of the world's largest social networking platforms.
翻译:通过A/B测试估计长期处理效应具有挑战性。诸如产品功能更新、用户界面设计和推荐算法等处理措施,在初始发布后往往需要在系统中长期持续运行。然而,由于开展长期实验存在诸多限制,实践者通常依赖短期实验结果来做出产品发布决策。如何利用短期实验数据准确估计长期处理效应,仍是一个悬而未决的问题。为解决这一难题,我们提出了一个纵向替代指标框架,该框架将长期效应分解为基于用户属性、短期指标和处理分配的函数。我们在此框架下阐述了识别假设、估计策略、推断技术和验证方法。通过使用来自两个真实世界实验的数据(每个实验均涉及微信——全球最大的社交网络平台之一——的百万级用户),我们实证证明了该方法优于现有解决方案。