Web applications where users are presented with a limited selection of items have long employed ranking models to put the most relevant results first. Any feedback received from users is typically assumed to reflect a relative judgement on the utility of items, e.g. a user clicking on an item only implies it is better than items not clicked in the same ranked list. Hence, the objectives optimized in Learning-to-Rank (LTR) tend to be pairwise or listwise. Yet, by only viewing feedback as relative, we neglect the user's absolute feedback on the list's overall quality, e.g. when no items in the selection are clicked. We thus reconsider the standard LTR paradigm and argue the benefits of learning from this listwide signal. To this end, we propose the RankFormer as an architecture that, with a Transformer at its core, can jointly optimize a novel listwide assessment objective and a traditional listwise LTR objective. We simulate implicit feedback on public datasets and observe that the RankFormer succeeds in benefitting from listwide signals. Additionally, we conduct experiments in e-commerce on Amazon Search data and find the RankFormer to be superior to all baselines offline. An online experiment shows that knowledge distillation can be used to find immediate practical use for the RankFormer.
翻译:在向用户展示有限项目选择的网络应用中,长期采用排序模型将最相关的结果置于首位。通常认为用户反馈反映了对项目效用的相对判断,例如用户点击某个项目仅意味着其优于同一排序列表中未点击的其他项目。因此,学习排序优化目标往往是成对式或列表式的。然而,仅将反馈视为相对评价,会忽略用户对列表整体质量的绝对反馈(例如当所选项目中无任何项目被点击时)。为此,我们重新审视标准学习排序范式,并论证从这种列表级信号中学习的优势。基于此,我们提出RankFormer架构,其核心采用Transformer,能够联合优化新颖的列表级评估目标与传统列表式学习排序目标。我们在公开数据集上模拟隐式反馈,观察到RankFormer成功从列表级信号中获益。同时,在亚马逊搜索数据的电商实验中,RankFormer在离线测试中优于所有基线模型。在线实验表明,知识蒸馏可为RankFormer找到即时的实际应用途径。