Recommendation foundation model utilizes large language models (LLM) for recommendation by converting recommendation tasks into natural language tasks. It enables generative recommendation which directly generates the item(s) to recommend rather than calculating a ranking score for each and every candidate item in traditional recommendation models, simplifying the recommendation pipeline from multi-stage filtering to single-stage filtering. To avoid generating excessively long text when deciding which item(s) to recommend, creating LLM-compatible item IDs is essential for recommendation foundation models. In this study, we systematically examine the item indexing problem for recommendation foundation models, using P5 as the representative backbone model and replicating its results with various indexing methods. To emphasize the importance of item indexing, we first discuss the issues of several trivial item indexing methods, such as independent indexing, title indexing, and random indexing. We then propose four simple yet effective solutions, including sequential indexing, collaborative indexing, semantic (content-based) indexing, and hybrid indexing. Our reproducibility study of P5 highlights the significant influence of item indexing methods on the model performance, and our results on real-world datasets validate the effectiveness of our proposed solutions.
翻译:推荐基础模型通过将推荐任务转化为自然语言任务,利用大型语言模型实现推荐功能。该模型采用生成式推荐方式,直接生成待推荐的物品,而非像传统推荐模型那样为每个候选物品计算排序分数,从而将推荐流程从多阶段过滤简化为单阶段过滤。为避免在确定推荐物品时生成过长的文本,构建兼容大型语言模型的物品ID对推荐基础模型至关重要。本研究以P5作为代表性骨干模型,通过复现不同索引方法下的实验结果,系统性地探讨了推荐基础模型中的物品索引问题。为突显物品索引的重要性,我们首先讨论了几种简单物品索引方法(如独立索引、标题索引、随机索引)存在的问题,随后提出了四种简单有效的解决方案:顺序索引、协同索引、语义(基于内容)索引及混合索引。对P5的可复现性研究揭示了物品索引方法对模型性能的显著影响,真实世界数据集上的实验结果验证了所提方案的有效性。