In streaming platforms churn is extremely costly, yet A/B tests are typically evaluated using outcomes observed within a limited experimental horizon. Even when both short- and predicted long-term engagement metrics are considered, they may fail to capture how a treatment affects users' retention. Consequently, an intervention may appear beneficial in the short term and neutral in the long term while still generating lower total value than the control due to users churn. To address this limitation, we introduce a method that estimates long-term treatment effects (LTE) and residual lifetime value change ($ΔERLV$) in short multi-cohort A/B tests under user learning. To estimate time-varying treatment effects efficiently, we introduce an inverse-variance weighted estimator that combines multiple cohorts estimates, reducing variance relative to standard approaches in the literature. The estimated treatment trajectory is then modeled as a parametric decay to recover both the asymptotic treatment effect and the cumulative value generated over time. Our framework enables simultaneous evaluation of steady-state impact and residual user value within a single experiment. Empirical results show improved precision in estimating LTE and $ΔERLV$ and identify scenarios in which relying on either short-term or long-term metrics alone would lead to incorrect product decisions.
翻译:在流媒体平台中,用户流失代价极高,而A/B测试通常仅基于有限实验窗口期内观测到的结果进行评估。即使同时考虑短期和预测的长期参与度指标,这些指标仍可能无法捕捉处理方案对用户留存的影响。因此,某项干预措施可能在短期内看似有益、长期效果中性,但由于用户流失,其产生的总价值仍可能低于对照组。为解决此局限,我们提出一种方法,可在用户学习效应下的短期多队列A/B测试中估计长期处理效应(LTE)及剩余生命周期价值变化(ΔERLV)。为高效估计时变处理效应,我们引入一种逆方差加权估计量,通过组合多个队列的估计结果,相较于文献中的标准方法降低了方差。随后将估计的处理效应轨迹建模为参数衰减形式,以恢复渐近处理效应及随时间累积的价值。该框架能够在单一实验中同时评估稳态影响与剩余用户价值。实证结果表明,该方法在估计LTE与ΔERLV时精度提升,并识别出仅依赖短期或长期指标会导向错误产品决策的场景。