Recent agent-based recommendation frameworks aim to simulate user behaviors by incorporating memory mechanisms and prompting strategies, but they struggle with hallucinating non-existent items and full-catalog ranking. Besides, a largely underexplored opportunity lies in leveraging LLMs'commonsense reasoning to capture user intent through substitute and complement relationships between items, which are usually implicit in datasets and difficult for traditional ID-based recommenders to capture. In this work, we propose a novel LLM-agent framework, AgenDR, which bridges LLM reasoning with scalable recommendation tools. Our approach delegates full-ranking tasks to traditional models while utilizing LLMs to (i) integrate multiple recommendation outputs based on personalized tool suitability and (ii) reason over substitute and complement relationships grounded in user history. This design mitigates hallucination, scales to large catalogs, and enhances recommendation relevance through relational reasoning. Through extensive experiments on three public grocery datasets, we show that our framework achieves superior full-ranking performance, yielding on average a twofold improvement over its underlying tools. We also introduce a new LLM-based evaluation metric that jointly measures semantic alignment and ranking correctness.
翻译:近期基于智能体的推荐框架旨在通过融入记忆机制与提示策略来模拟用户行为,但这些框架在应对商品幻觉问题及全库排序任务时仍面临挑战。此外,当前研究尚未充分挖掘利用大语言模型(LLM)的常识推理能力,以通过商品间的替代与互补关系来捕捉用户意图——这类关系在数据集中通常呈隐含状态,且难以被传统基于ID的推荐模型捕获。本研究提出一种新型LLM智能体框架AgentDR,该框架将LLM推理能力与可扩展的推荐工具相融合。我们的方法将全库排序任务分配给传统模型处理,同时利用LLM实现以下功能:(i)基于个性化工具适配性整合多个推荐输出;(ii)依据用户历史记录对商品间的替代与互补关系进行推理。该设计能有效缓解商品幻觉现象,适应大规模商品库场景,并通过关系推理提升推荐相关性。通过在三个公开零售数据集上的大量实验,我们证明该框架在全库排序任务中取得优越性能,其平均表现较底层工具提升约两倍。同时,我们提出一种新型基于LLM的评估指标,可同步衡量语义对齐度与排序准确性。