Recommender systems are tasked to infer users' evolving preferences and rank items aligned with their intents, which calls for in-depth reasoning beyond pattern-based scoring. Recent efforts start to leverage large language models (LLMs) for recommendation, but how to effectively optimize the model for improved recommendation utility is still under explored. In this work, we propose Reasoning to Rank, an end-to-end training framework that internalizes recommendation utility optimization into the learning of step-by-step reasoning in LLMs. To avoid position bias in LLM reasoning and enable direct optimization of the reasoning process, our framework performs reasoning at the user-item level and employs reinforcement learning for end-to-end training of the LLM. Experiments on three Amazon datasets and a large-scale industrial dataset showed consistent gains over strong conventional and LLM-based solutions. Extensive in-depth analyses validate the necessity of the key components in the proposed framework and shed lights on the future developments of this line of work.
翻译:推荐系统的核心任务是推断用户不断演变的偏好,并对符合其意图的物品进行排序,这需要超越基于模式评分的深度推理。近期研究开始利用大型语言模型(LLMs)进行推荐,但如何有效优化模型以提升推荐效用仍待深入探索。本研究提出"推理排序"——一种端到端训练框架,将推荐效用优化内化于LLMs逐步推理的学习过程中。为规避LLM推理中的位置偏差并实现对推理过程的直接优化,本框架在用户-物品层级执行推理,并采用强化学习对LLM进行端到端训练。在三个亚马逊数据集及大规模工业数据集上的实验表明,该方法相较于传统强基线及现有LLM解决方案均取得持续性能提升。深入的全面分析验证了所提框架中关键组件的必要性,并为该研究方向的未来发展提供了启示。