Large Language Models (LLMs) are emerging as promising approaches to enhance session-based recommendation (SBR), where both prompt-based and fine-tuning-based methods have been widely investigated to align LLMs with SBR. However, the former methods struggle with optimal prompts to elicit the correct reasoning of LLMs due to the lack of task-specific feedback, leading to unsatisfactory recommendations. Although the latter methods attempt to fine-tune LLMs with domain-specific knowledge, they face limitations such as high computational costs and reliance on open-source backbones. To address such issues, we propose a Reflective Reinforcement Large Language Model (Re2LLM) for SBR, guiding LLMs to focus on specialized knowledge essential for more accurate recommendations effectively and efficiently. In particular, we first design the Reflective Exploration Module to effectively extract knowledge that is readily understandable and digestible by LLMs. To be specific, we direct LLMs to examine recommendation errors through self-reflection and construct a knowledge base (KB) comprising hints capable of rectifying these errors. To efficiently elicit the correct reasoning of LLMs, we further devise the Reinforcement Utilization Module to train a lightweight retrieval agent. It learns to select hints from the constructed KB based on the task-specific feedback, where the hints can serve as guidance to help correct LLMs reasoning for better recommendations. Extensive experiments on multiple real-world datasets demonstrate that our method consistently outperforms state-of-the-art methods.
翻译:大型语言模型(LLMs)正成为增强会话推荐(SBR)的有前景方法,其中基于提示和基于微调的方法已被广泛研究以将LLMs与SBR对齐。然而,前一类方法因缺乏任务特定反馈而难以找到最优提示以激发LLMs的正确推理,导致推荐效果不佳。后一类方法虽尝试用领域特定知识微调LLMs,但面临计算成本高及依赖开源基座模型等局限。为解决这些问题,我们提出用于SBR的反思性强化大型语言模型(Re2LLM),有效且高效地引导LLMs聚焦于更精确推荐所需的关键知识。具体而言,我们首先设计反思性探索模块,以有效提取LLMs易于理解和消化的知识。通过引导LLMs通过自我反思检查推荐错误,并构建包含可纠正这些错误的提示的知识库(KB)。为高效激发LLMs的正确推理,我们进一步设计强化利用模块以训练轻量级检索智能体,使其根据任务特定反馈从构建的知识库中学习选择提示,这些提示可作为指导帮助修正LLMs的推理以实现更优推荐。在多个真实数据集上的广泛实验表明,我们的方法持续优于现有最优方法。