Recommender systems are essential for content-sharing platforms by curating personalized content. To evaluate updates to recommender systems targeting content creators, platforms frequently rely on creator-side randomized experiments. The treatment effect measures the change in outcomes when a new algorithm is implemented compared to the status quo. We show that the standard difference-in-means estimator can lead to biased estimates due to recommender interference that arises when treated and control creators compete for exposure. We propose a "recommender choice model" that describes which item gets exposed from a pool containing both treated and control items. By combining a structural choice model with neural networks, this framework directly models the interference pathway while accounting for rich viewer-content heterogeneity. We construct a debiased estimator of the treatment effect and prove it is $\sqrt n$-consistent and asymptotically normal with potentially correlated samples. We validate our estimator's empirical performance with a field experiment on Weixin short-video platform. In addition to the standard creator-side experiment, we conduct a costly double-sided randomization design to obtain a benchmark estimate free from interference bias. We show that the proposed estimator yields results comparable to the benchmark, whereas the standard difference-in-means estimator can exhibit significant bias and even produce reversed signs.
翻译:推荐系统通过筛选个性化内容,对内容共享平台至关重要。为评估针对内容创作者的推荐系统更新,平台常依赖创作者侧随机实验。处理效应衡量新算法实施后相较于现状的结果变化。我们证明,由于处理和对照创作者在曝光度上竞争产生的推荐系统干扰,标准均值差分估计量可能导致有偏估计。我们提出一个“推荐选择模型”,描述从包含处理和对照项目的候选池中哪个项目获得曝光。通过将结构化选择模型与神经网络相结合,该框架直接建模干扰路径,同时考虑丰富的观众-内容异质性。我们构建了处理效应的去偏估计量,并证明其在样本可能相关的情况下具有$\sqrt n$相合性和渐近正态性。我们通过微信短视频平台的现场实验验证了估计量的实证性能。除标准创作者侧实验外,我们实施了成本高昂的双侧随机化设计以获得无干扰偏差的基准估计。结果表明,所提估计量产生的结果与基准相当,而标准均值差分估计量可能表现出显著偏差甚至产生符号逆转。