Large language models (LLMs), endowed with exceptional reasoning capabilities, are adept at discerning profound user interests from historical behaviors, thereby presenting a promising avenue for the advancement of recommendation systems. However, a notable discrepancy persists between the sparse collaborative semantics typically found in recommendation systems and the dense token representations within LLMs. In our study, we propose a novel framework that harmoniously merges traditional recommendation models with the prowess of LLMs. We initiate this integration by transforming ItemIDs into sequences that align semantically with the LLMs space, through the proposed Alignment Tokenization module. Additionally, we design a series of specialized supervised learning tasks aimed at aligning collaborative signals with the subtleties of natural language semantics. To ensure practical applicability, we optimize online inference by pre-caching the top-K results for each user, reducing latency and improving effciency. Extensive experimental evidence indicates that our model markedly improves recall metrics and displays remarkable scalability of recommendation systems.
翻译:大型语言模型(LLMs)凭借卓越的推理能力,擅长从历史行为中洞察深层用户兴趣,为推荐系统的发展提供了新路径。然而,传统推荐系统中稀疏的协同语义与LLMs内部稠密的标记表示之间仍存在显著差异。本研究提出一种创新框架,将传统推荐模型与LLMs能力有机融合。我们首先通过设计的对齐标记化模块,将ItemID转换为与LLMs语义空间对齐的序列。此外,我们构建了一系列专项监督学习任务,旨在将协同信号与自然语言语义的细微特征进行对齐。为保障实际应用性,我们通过预缓存每位用户的前K个结果来优化在线推理,显著降低延迟并提升系统效率。大量实验证明,该模型能显著提升召回率指标,并展现出卓越的推荐系统可扩展性。