Effective exploration is believed to positively influence the long-term user experience on recommendation platforms. Determining its exact benefits, however, has been challenging. Regular A/B tests on exploration often measure neutral or even negative engagement metrics while failing to capture its long-term benefits. To address this, we present a systematic study to formally quantify the value of exploration by examining its effects on the content corpus, a key entity in the recommender system that directly affects user experiences. Specifically, we introduce new metrics and the associated experiment design to measure the benefit of exploration on the corpus change, and further connect the corpus change to the long-term user experience. Furthermore, we investigate the possibility of introducing the Neural Linear Bandit algorithm to build an exploration-based ranking system, and use it as the backbone algorithm for our case study. We conduct extensive live experiments on a large-scale commercial recommendation platform that serves billions of users to validate the new experiment designs, quantify the long-term values of exploration, and to verify the effectiveness of the adopted neural linear bandit algorithm for exploration.
翻译:有效探索被认为能够积极影响推荐平台的长期用户体验。然而,确定其确切益处一直颇具挑战。常规的A/B测试在探索策略上往往测得中性甚至是负面的参与度指标,同时未能捕捉其长期收益。为解决这一问题,我们开展了一项系统性研究,通过考察探索对内容语料库(推荐系统中直接影响用户体验的关键实体)的影响,正式量化探索的价值。具体而言,我们引入了新的度量指标及相应的实验设计,以衡量探索对语料库变化的收益,并进一步将语料库变化与长期用户体验相关联。此外,我们探究了引入神经线性赌博机算法构建基于探索的排序系统的可能性,并将其作为案例研究的基础算法。我们在一个服务于数十亿用户的大规模商业推荐平台上进行了广泛的在线实验,以验证新的实验设计、量化探索的长期价值,并检验所采用的神经线性赌博机算法在探索中的有效性。