Recent recommender systems increasingly leverage embeddings from large pre-trained language models (PLMs). However, such embeddings exhibit two key limitations: (1) PLMs are not explicitly optimized to produce structured and discriminative embedding spaces, and (2) their representations remain overly generic, often failing to capture the domain-specific semantics crucial for recommendation tasks. We present EncodeRec, an approach designed to align textual representations with recommendation objectives while learning compact, informative embeddings directly from item descriptions. EncodeRec keeps the language model parameters frozen during recommender system training, making it computationally efficient without sacrificing semantic fidelity. Experiments across core recommendation benchmarks demonstrate its effectiveness both as a backbone for sequential recommendation models and for semantic ID tokenization, showing substantial gains over PLM-based and embedding model baselines. These results underscore the pivotal role of embedding adaptation in bridging the gap between general-purpose language models and practical recommender systems.
翻译:近年来,推荐系统越来越多地利用来自大型预训练语言模型(PLMs)的嵌入表示。然而,此类嵌入存在两个关键局限:(1)PLMs并未被明确优化以产生结构化和判别性的嵌入空间;(2)其表示仍过于通用,往往无法捕捉对推荐任务至关重要的领域特定语义。我们提出EncodeRec,该方法旨在将文本表示与推荐目标对齐,同时直接从物品描述中学习紧凑且信息丰富的嵌入。EncodeRec在推荐系统训练期间保持语言模型参数冻结,使其在计算高效的同时不牺牲语义保真度。在核心推荐基准上的实验表明,其作为序列推荐模型的主干网络及用于语义ID标记化均表现出显著有效性,相比基于PLM的基线模型和嵌入模型基线均取得实质性提升。这些结果凸显了嵌入适配在弥合通用语言模型与实际推荐系统之间差距中的关键作用。