As the focus on Large Language Models (LLMs) in the field of recommendation intensifies, the optimization of LLMs for recommendation purposes (referred to as LLM4Rec) assumes a crucial role in augmenting their effectiveness in providing recommendations. However, existing approaches for LLM4Rec often assess performance using restricted sets of candidates, which may not accurately reflect the models' overall ranking capabilities. In this paper, our objective is to investigate the comprehensive ranking capacity of LLMs and propose a two-step grounding framework known as BIGRec (Bi-step Grounding Paradigm for Recommendation). It initially grounds LLMs to the recommendation space by fine-tuning them to generate meaningful tokens for items and subsequently identifies appropriate actual items that correspond to the generated tokens. By conducting extensive experiments on two datasets, we substantiate the superior performance, capacity for handling few-shot scenarios, and versatility across multiple domains exhibited by BIGRec. Furthermore, we observe that the marginal benefits derived from increasing the quantity of training samples are modest for BIGRec, implying that LLMs possess the limited capability to assimilate statistical information, such as popularity and collaborative filtering, due to their robust semantic priors. These findings also underline the efficacy of integrating diverse statistical information into the LLM4Rec framework, thereby pointing towards a potential avenue for future research. Our code and data are available at https://github.com/SAI990323/Grounding4Rec.
翻译:随着大语言模型在推荐领域研究的深入,针对推荐任务优化的大语言模型(LLM4Rec)在提升推荐效果方面发挥关键作用。然而,现有LLM4Rec方法常采用受限候选集评估性能,难以准确反映模型整体排序能力。本文旨在探究大语言模型的完整排序能力,提出名为BIGRec(双步对齐推荐范式)的两阶段对齐框架:首先通过微调将大语言模型对齐到推荐空间,使其生成有意义的物品标记;随后确定与生成的标记相对应的真实物品。通过在两个数据集上开展广泛实验,我们证实了BIGRec在性能、小样本处理能力及跨领域适应性方面的优越表现。此外,我们发现增加训练样本对BIGRec带来的边际收益有限,这表明大语言模型因具备强大的语义先验,其吸收流行度、协同过滤等统计信息的能力存在局限。这一发现同时凸显了将多样化统计信息融入LLM4Rec框架的有效性,为未来研究指明了潜在方向。相关代码与数据已开源至https://github.com/SAI990323/Grounding4Rec。