Harnessing Large Language Models (LLMs) for generative recommendation has garnered significant attention due to LLMs' powerful capacities such as rich world knowledge and reasoning. However, a critical challenge lies in transforming recommendation data into the language space of LLMs through effective item tokenization. Existing approaches, such as ID identifiers, textual identifiers, and codebook-based identifiers, exhibit limitations in encoding semantic information, incorporating collaborative signals, or handling code assignment bias. To address these shortcomings, we propose LETTER (a LEarnable Tokenizer for generaTivE Recommendation), designed to meet the key criteria of identifiers by integrating hierarchical semantics, collaborative signals, and code assignment diversity. LETTER integrates Residual Quantized VAE for semantic regularization, a contrastive alignment loss for collaborative regularization, and a diversity loss to mitigate code assignment bias. We instantiate LETTER within two generative recommender models and introduce a ranking-guided generation loss to enhance their ranking ability. Extensive experiments across three datasets demonstrate the superiority of LETTER in item tokenization, thereby advancing the state-of-the-art in the field of generative recommendation.
翻译:利用大型语言模型(LLMs)进行生成式推荐因其丰富的世界知识与推理能力而受到广泛关注。然而,一个关键挑战在于通过有效的物品分词将推荐数据映射到LLMs的语言空间。现有方法如ID标识符、文本标识符和基于码本的标识符,在编码语义信息、融入协同信号或处理码本分配偏差方面存在局限。针对这些不足,我们提出LETTER(一种面向生成式推荐的可学习分词器),该分词器通过整合层次语义、协同信号与码本分配多样性,满足标识符的关键准则。LETTER融合了用于语义正则化的残差量化变分自编码器、用于协同正则化的对比对齐损失以及缓解码本分配偏差的多样性损失。我们将LETTER实例化于两种生成式推荐模型,并引入排序引导生成损失以增强其排序能力。在三个数据集上的广泛实验表明,LETTER在物品分词上具有优越性,从而推动了生成式推荐领域的最新进展。