Effective exploration is believed to positively influence the long-term user experience on recommendation platforms. Determining its exact benefits, however, has been challenging. Regular A/B tests on exploration often measure neutral or even negative engagement metrics while failing to capture its long-term benefits. We here introduce new experiment designs to formally quantify the long-term value of exploration by examining its effects on content corpus, and connecting content corpus growth to the long-term user experience from real-world experiments. Once established the values of exploration, we investigate the Neural Linear Bandit algorithm as a general framework to introduce exploration into any deep learning based ranking systems. We conduct live experiments on one of the largest short-form video recommendation platforms that serves billions of users to validate the new experiment designs, quantify the long-term values of exploration, and to verify the effectiveness of the adopted neural linear bandit algorithm for exploration.
翻译:有效探索被认为能够积极影响推荐平台的长期用户体验。然而,量化其具体收益一直颇具挑战。常规的探索A/B测试往往显示用户参与度指标中性甚至负面,却未能捕捉其长期价值。本文引入新的实验设计,通过检验探索对内容库的影响,并将内容库增长与来自真实实验的长期用户体验相关联,从而正式量化探索的长期价值。在确立探索的价值后,我们研究了神经线性波段算法作为一种通用框架,将探索引入任何基于深度学习的排序系统。我们在全球最庞大的短视频推荐平台(服务数十亿用户)上开展在线实验,以验证新实验设计、量化探索的长期价值,并验证所采用的神经线性波段算法在探索中的有效性。