Applied recommender systems research is in a curious position. While there is a very rigorous protocol for measuring performance by A/B testing, best practice for finding a `B' to test does not explicitly target performance but rather targets a proxy measure. The success or failure of a given A/B test then depends entirely on if the proposed proxy is better correlated to performance than the previous proxy. No principle exists to identify if one proxy is better than another offline, leaving the practitioners shooting in the dark. The purpose of this position paper is to question this anti-Utopian thinking and argue that a non-standard use of the deep learning stacks actually has the potential to unlock reward optimizing recommendation.
翻译:应用推荐系统研究正处于一种奇特境地。尽管存在通过A/B测试衡量性能的严格协议,但寻找用于测试的“B”方案的最佳实践并未直接以性能为目标,而是针对代理指标进行优化。给定A/B测试的成败完全取决于所提出的代理指标是否比之前的代理指标与性能具有更好的相关性。目前尚无原理可离线判定某一代理指标是否优于另一指标,导致从业者只能在暗箱中摸索。本立场论文旨在质疑这种反乌托邦思维,并论证对深度学习框架进行非标准应用实际上具有解锁奖励优化推荐的潜力。