Large Language Models (LLMs) are emerging as promising approaches to enhance session-based recommendation (SBR), where both prompt-based and fine-tuning-based methods have been widely investigated to align LLMs with SBR. However, the former methods struggle with optimal prompts to elicit the correct reasoning of LLMs due to the lack of task-specific feedback, leading to unsatisfactory recommendations. Although the latter methods attempt to fine-tune LLMs with domain-specific knowledge, they face limitations such as high computational costs and reliance on open-source backbones. To address such issues, we propose a \underline{Re}flective \underline{Re}inforcement \underline{L}arge \underline{L}anguage \underline{M}odel (Re2LLM) for SBR, guiding LLMs to focus on specialized knowledge essential for more accurate recommendations effectively and efficiently. In particular, we first design the Reflective Exploration Module to effectively extract knowledge that is readily understandable and digestible by LLMs. To be specific, we direct LLMs to examine recommendation errors through self-reflection and construct a knowledge base (KB) comprising hints capable of rectifying these errors. To efficiently elicit the correct reasoning of LLMs, we further devise the Reinforcement Utilization Module to train a lightweight retrieval agent. It learns to select hints from the constructed KB based on the task-specific feedback, where the hints can serve as guidance to help correct LLMs reasoning for better recommendations. Extensive experiments on multiple real-world datasets demonstrate that our method consistently outperforms state-of-the-art methods.
翻译:大型语言模型(LLMs)正成为增强会话推荐(SBR)的前沿方法,其中基于提示和基于微调的方法已被广泛研究以将LLMs与SBR对齐。然而,前一类方法因缺乏任务特定反馈而难以通过最优提示引导LLMs进行正确推理,导致推荐效果不理想。后一类方法虽尝试用领域特定知识微调LLMs,却面临计算成本高、依赖开源基座模型等局限。为解决这些问题,我们提出面向SBR的反思式增强大语言模型(Re2LLM),高效引导LLMs聚焦于提升推荐准确性的专业知识。具体而言,我们首先设计反思探索模块,有效提取LLMs易于理解与消化的知识:引导LLMs通过自我反思审视推荐错误,并构建包含可修正错误的提示知识库(KB)。为高效激发LLMs的正确推理能力,我们进一步设计增强利用模块,训练轻量级检索智能体。该智能体基于任务特定反馈从构建的知识库中学习选择提示,这些提示可作为引导帮助修正LLMs的推理过程以实现更优推荐。在多个真实数据集上的大量实验表明,本方法持续优于当前最先进方法。