Fine-grained Semantics Integration for Large Language Model-based Recommendation

Recent advances in Large Language Models (LLMs) have shifted in recommendation systems from the discriminative paradigm to the LLM-based generative paradigm, where the recommender autoregressively generates sequences of semantic identifiers (SIDs) for target items conditioned on historical interaction. While prevalent LLM-based recommenders have demonstrated performance gains by aligning pretrained LLMs between the language space and the SID space, modeling the SID space still faces two fundamental challenges: (1) Semantically Meaningless Initialization: SID tokens are randomly initialized, severing the semantic linkage between the SID space and the pretrained language space at start point, and (2) Coarse-grained Alignment: existing SFT-based alignment tasks primarily focus on item-level optimization, while overlooking the semantics of individual tokens within SID sequences. To address these challenges, we propose TS-Rec, which can integrate Token-level Semantics into LLM-based Recommenders. Specifically, TS-Rec comprises two key components: (1) Semantic-Aware embedding Initialization (SA-Init), which initializes SID token embeddings by applying mean pooling to the pretrained embeddings of keywords extracted by a teacher model; and (2) Token-level Semantic Alignment (TS-Align), which aligns individual tokens within the SID sequence with the shared semantics of the corresponding item clusters. Extensive experiments on two real-world benchmarks demonstrate that TS-Rec consistently outperforms traditional and generative baselines across all standard metrics. The results demonstrate that integrating fine-grained semantic information significantly enhances the performance of LLM-based generative recommenders.

翻译：近年来，大语言模型（LLMs）的进展已推动推荐系统从判别式范式转向基于LLM的生成式范式，其中推荐模型以历史交互为条件，自回归地生成目标物品的语义标识符（SIDs）序列。尽管当前主流的基于LLM的推荐器通过在语言空间与SID空间之间对齐预训练LLM已展现出性能提升，但SID空间的建模仍面临两个根本性挑战：（1）语义无意义的初始化：SID词元被随机初始化，切断了SID空间与预训练语言空间在初始阶段的语义关联；（2）粗粒度对齐：现有基于监督微调的对齐任务主要关注物品级优化，而忽视了SID序列内各词元自身的语义。为应对这些挑战，我们提出了TS-Rec，一种将词元级语义集成至基于LLM的推荐器的方法。具体而言，TS-Rec包含两个关键组件：（1）语义感知嵌入初始化（SA-Init），通过应用均值池化处理教师模型提取的关键词预训练嵌入，来初始化SID词元嵌入；（2）词元级语义对齐（TS-Align），将SID序列内的各个词元与对应物品簇的共享语义进行对齐。在两个真实世界基准数据集上的大量实验表明，TS-Rec在所有标准评估指标上均持续优于传统推荐方法与生成式基线方法。结果证明，集成细粒度语义信息能显著提升基于LLM的生成式推荐器的性能。