Applied recommender systems research is in a curious position. While there is a very rigorous protocol for measuring performance by A/B testing, best practice for finding a `B' to test does not explicitly target performance but rather targets a proxy measure. The success or failure of a given A/B test then depends entirely on if the proposed proxy is better correlated to performance than the previous proxy. No principle exists to identify if one proxy is better than another offline, leaving the practitioners shooting in the dark. The purpose of this position paper is to question this anti-Utopian thinking and argue that a non-standard use of the deep learning stacks actually has the potential to unlock reward optimizing recommendation.
翻译:应用推荐系统研究处于一种奇特境地。尽管A/B测试已建立严格的性能度量规范,但寻找待测方案"B"的最佳实践并未直接针对性能优化,而是着眼于替代指标。特定A/B测试的成败完全取决于所提替代指标与性能的相关性是否优于先前指标。目前尚不存在能离线判别替代指标优劣性的原则,导致从业者陷入暗箱试错(shooting in the dark)。本立场论文旨在质疑这种反乌托邦思维,论证通过非常规方式运用深度学习堆栈,实际上具备解锁奖励优化推荐系统的潜力。