Collaborative filtering models, particularly graph-based approaches, have demonstrated strong performance in capturing user-item interactions for recommendation systems. However, they continue to struggle in cold-start and data-sparse scenarios. The emergence of large language models (LLMs) like GPT and LLaMA presents new possibilities for enhancing recommendation performance, especially in cold-start settings. Despite their promise, LLMs pose challenges related to scalability and efficiency due to their high computational demands and limited ability to model complex user-item relationships effectively. In this work, we introduce a novel perspective on leveraging LLMs for CF model initialization. Through experiments, we uncover an embedding collapse issue when scaling CF models to larger embedding dimensions. To effectively harness large-scale LLM embeddings, we propose innovative selective initialization strategies utilizing random, uniform, and variance-based index sampling. Our comprehensive evaluation on multiple real-world datasets demonstrates significant performance gains across various CF models while maintaining a lower computational cost compared to existing LLM-based recommendation approaches.
翻译:协同过滤模型,特别是基于图的方法,在捕捉推荐系统中的用户-物品交互方面已展现出强大性能。然而,在冷启动和数据稀疏场景下,这些模型仍面临挑战。以GPT和LLaMA为代表的大型语言模型的出现,为提升推荐性能(尤其是在冷启动场景下)提供了新的可能性。尽管前景广阔,但LLMs因其高计算需求以及对复杂用户-物品关系建模能力的局限性,在可扩展性和效率方面仍存在挑战。本工作提出了一种利用LLMs进行协同过滤模型初始化的新视角。通过实验,我们发现当协同过滤模型扩展到更大嵌入维度时会出现嵌入坍缩问题。为有效利用大规模LLM嵌入,我们提出了基于随机、均匀和方差索引采样的创新性选择性初始化策略。在多个真实数据集上的综合评估表明,相较于现有基于LLM的推荐方法,我们的方法在保持较低计算成本的同时,能够为各类协同过滤模型带来显著的性能提升。