I examine a conceptual model of a recommendation system (RS) with user inflow and churn dynamics. When inflow and churn balance out, the user distribution reaches a steady state. Changing the recommendation algorithm alters the steady state and creates a transition period. During this period, the RS behaves differently from its new steady state. In particular, A/B experiment metrics obtained in transition periods are biased indicators of the RS's long-term performance. Scholars and practitioners, however, often conduct A/B tests shortly after introducing new algorithms to validate their effectiveness. This A/B experiment paradigm, widely regarded as the gold standard for assessing RS improvements, may consequently yield false conclusions. I also briefly touch on the data bias caused by the user retention dynamics.
翻译:本文研究了一个包含用户流入与流失动态的推荐系统概念模型。当流入与流失达到平衡时,用户分布进入稳态。改变推荐算法会改变稳态并引发过渡期。在此期间,推荐系统的行为与其新稳态存在差异。具体而言,在过渡期获得的A/B实验指标是推荐系统长期性能的有偏估计。然而,学者与实践者常在引入新算法后立即进行A/B测试以验证其有效性。这种被广泛视为评估推荐系统改进的黄金标准的A/B实验范式,因此可能得出错误结论。本文亦简要探讨了由用户留存动态引起的数据偏差。