The long-tail recommendation is a challenging task for traditional recommender systems, due to data sparsity and data imbalance issues. The recent development of large language models (LLMs) has shown their abilities in complex reasoning, which can help to deduce users' preferences based on very few previous interactions. However, since most LLM-based systems rely on items' semantic meaning as the sole evidence for reasoning, the collaborative information of user-item interactions is neglected, which can cause the LLM's reasoning to be misaligned with task-specific collaborative information of the dataset. To further align LLMs' reasoning to task-specific user-item interaction knowledge, we introduce collaborative retrieval-augmented LLMs, CoRAL, which directly incorporate collaborative evidence into the prompts. Based on the retrieved user-item interactions, the LLM can analyze shared and distinct preferences among users, and summarize the patterns indicating which types of users would be attracted by certain items. The retrieved collaborative evidence prompts the LLM to align its reasoning with the user-item interaction patterns in the dataset. However, since the capacity of the input prompt is limited, finding the minimally-sufficient collaborative information for recommendation tasks can be challenging. We propose to find the optimal interaction set through a sequential decision-making process and develop a retrieval policy learned through a reinforcement learning (RL) framework, CoRAL. Our experimental results show that CoRAL can significantly improve LLMs' reasoning abilities on specific recommendation tasks. Our analysis also reveals that CoRAL can more efficiently explore collaborative information through reinforcement learning.
翻译:中文摘要:长尾推荐对传统推荐系统而言是一项具有挑战性的任务,主要源于数据稀疏性和数据不平衡问题。大语言模型的最新发展展现出其在复杂推理方面的能力,能够基于极少的先前交互推断用户偏好。然而,由于大多数基于大语言模型的系统仅依赖物品的语义含义作为推理依据,用户-物品交互的协作信息被忽略,这可能导致大语言模型的推理与数据集中任务特定的协作信息产生偏差。为进一步将大语言模型的推理与任务特定的用户-物品交互知识对齐,我们提出了协作检索增强型大语言模型CoRAL,该模型直接将协作证据融入提示中。基于检索到的用户-物品交互,大语言模型能够分析用户间的共同偏好与差异化偏好,并总结出预测特定类型用户可能被哪些物品吸引的模式。检索到的协作证据促使大语言模型将其推理与数据集中的用户-物品交互模式对齐。然而,由于输入提示的容量有限,寻找推荐任务所需的最小充分协作信息极具挑战性。我们提出通过序贯决策过程寻找最优交互集,并开发基于强化学习框架训练的检索策略CoRAL。实验结果表明,CoRAL能够显著提升大语言模型在特定推荐任务中的推理能力。我们的分析还揭示,CoRAL可通过强化学习更高效地挖掘协作信息。