Recent advances in Large Language Models (LLMs) have shifted in recommendation systems from the discriminative paradigm to the LLM-based generative paradigm, where the recommender autoregressively generates sequences of semantic identifiers (SIDs) for target items conditioned on historical interaction. While prevalent LLM-based recommenders have demonstrated performance gains by aligning pretrained LLMs between the language space and the SID space, modeling the SID space still faces two fundamental challenges: (1) Semantically Meaningless Initialization: SID tokens are randomly initialized, severing the semantic linkage between the SID space and the pretrained language space at start point, and (2) Coarse-grained Alignment: existing SFT-based alignment tasks primarily focus on item-level optimization, while overlooking the semantics of individual tokens within SID sequences. To address these challenges, we propose TS-Rec, which can integrate Token-level Semantics into LLM-based Recommenders. Specifically, TS-Rec comprises two key components: (1) Semantic-Aware embedding Initialization (SA-Init), which initializes SID token embeddings by applying mean pooling to the pretrained embeddings of keywords extracted by a teacher model; and (2) Token-level Semantic Alignment (TS-Align), which aligns individual tokens within the SID sequence with the shared semantics of the corresponding item clusters. Extensive experiments on two real-world benchmarks demonstrate that TS-Rec consistently outperforms traditional and generative baselines across all standard metrics. The results demonstrate that integrating fine-grained semantic information significantly enhances the performance of LLM-based generative recommenders.
翻译:近年来,大语言模型(LLMs)的进展已推动推荐系统从判别式范式转向基于LLM的生成式范式,其中推荐模型以历史交互为条件,自回归地生成目标物品的语义标识符(SIDs)序列。尽管当前主流的基于LLM的推荐器通过在语言空间与SID空间之间对齐预训练LLM已展现出性能提升,但SID空间的建模仍面临两个根本性挑战:(1)语义无意义的初始化:SID词元被随机初始化,切断了SID空间与预训练语言空间在初始阶段的语义关联;(2)粗粒度对齐:现有基于监督微调的对齐任务主要关注物品级优化,而忽视了SID序列内各词元自身的语义。为应对这些挑战,我们提出了TS-Rec,一种将词元级语义集成至基于LLM的推荐器的方法。具体而言,TS-Rec包含两个关键组件:(1)语义感知嵌入初始化(SA-Init),通过应用均值池化处理教师模型提取的关键词预训练嵌入,来初始化SID词元嵌入;(2)词元级语义对齐(TS-Align),将SID序列内的各个词元与对应物品簇的共享语义进行对齐。在两个真实世界基准数据集上的大量实验表明,TS-Rec在所有标准评估指标上均持续优于传统推荐方法与生成式基线方法。结果证明,集成细粒度语义信息能显著提升基于LLM的生成式推荐器的性能。