Content-Based Collaborative Generation for Recommender Systems

Generative models have emerged as a promising utility to enhance recommender systems. It is essential to model both item content and user-item collaborative interactions in a unified generative framework for better recommendation. Although some existing large language model (LLM)-based methods contribute to fusing content information and collaborative signals, they fundamentally rely on textual language generation, which is not fully aligned with the recommendation task. How to integrate content knowledge and collaborative interaction signals in a generative framework tailored for item recommendation is still an open research challenge. In this paper, we propose content-based collaborative generation for recommender systems, namely ColaRec. ColaRec is a sequence-to-sequence framework which is tailored for directly generating the recommended item identifier. Precisely, the input sequence comprises data pertaining to the user's interacted items, and the output sequence represents the generative identifier (GID) for the suggested item. To model collaborative signals, the GIDs are constructed from a pretrained collaborative filtering model, and the user is represented as the content aggregation of interacted items. To this end, ColaRec captures both collaborative signals and content information in a unified framework. Then an item indexing task is proposed to conduct the alignment between the content-based semantic space and the interaction-based collaborative space. Besides, a contrastive loss is further introduced to ensure that items with similar collaborative GIDs have similar content representations. To verify the effectiveness of ColaRec, we conduct experiments on four benchmark datasets. Empirical results demonstrate the superior performance of ColaRec.

翻译：生成模型已成为增强推荐系统的有前景工具。为获得更好的推荐效果，必须在统一的生成框架中对物品内容与用户-物品协同交互进行建模。尽管现有基于大语言模型的方法有助于融合内容信息与协同信号，但其本质上依赖于文本语言生成，与推荐任务并未完全契合。如何针对物品推荐任务，在生成框架中整合内容知识与协同交互信号，仍是一个开放的研究挑战。本文提出面向推荐系统的基于内容的协同生成方法，即ColaRec。ColaRec是一个专为直接生成推荐物品标识符而设计的序列到序列框架。具体而言，输入序列包含用户交互物品的相关数据，输出序列则代表建议物品的生成式标识符。为建模协同信号，GID通过预训练的协同过滤模型构建，用户则表示为交互物品的内容聚合。由此，ColaRec在统一框架中同时捕获协同信号与内容信息。进一步提出物品索引任务，以实现基于内容的语义空间与基于交互的协同空间之间的对齐。此外，引入对比损失以确保具有相似协同GID的物品获得相近的内容表示。为验证ColaRec的有效性，我们在四个基准数据集上开展实验。实证结果表明ColaRec具有优越性能。