Generative Archetype-Grounded Item Representations for Sequential Recommendation

Sequential recommendation aims to predict users' next interaction with items by analyzing their historical behavior. However, the limited quality of item representations remains a critical bottleneck. While pre-trained large language models (LLMs) can provide rich semantic representations, existing approaches only rely on static encoding of fixed attributes, overlooking the crucial role of target audiences in defining item identity. Moreover, the semantic space struggles to reflect actual user behavior, resulting in a significant gap between semantic representations and behavioral patterns. To address these limitations, we propose GenAIR, a general framework that empowers sequential recommendation with Generative Archetype-grounded Item Representations. Specifically, we first leverage an LLM to analyze item metadata and infer textual description of the Archetype, which represents the conceptual profile of the item's ideal target audience. We then extract the corresponding embeddings in a single forward pass. Further, to ground these generative archetypes in real-world behavior, we introduce a behavioral calibration objective, which explicitly incorporates behavioral signals from actual interactions. This objective adjusts the structure of the embedding space to reflect empirical patterns. GenAIR enables seamless integration with most existing models while maintaining high efficiency. Comprehensive experiments conducted on three real-world datasets demonstrate that GenAIR significantly improves the performance of various sequential recommendation models and consistently outperforms state-of-the-art baseline approaches. Implementation codes are available at https://github.com/AI-Santiago/GenAIR.

翻译：序列推荐旨在通过分析用户的历史行为预测其下一次与物品的交互。然而，物品表示的有限质量仍然是关键瓶颈。尽管预训练大语言模型能够提供丰富的语义表示，现有方法仅依赖于对固定属性的静态编码，忽视了目标受众在定义物品身份中的关键作用。此外，语义空间难以反映实际用户行为，导致语义表示与行为模式之间存在显著差距。为解决这些局限，我们提出GenAIR——一个将序列推荐赋能于生成式原型驱动物品表示的通用框架。具体而言，我们首先利用大语言模型分析物品元数据，推断出原型的文本描述（即物品理想目标受众的概念画像），随后通过单次前向传播提取相应嵌入。为进一步将这些生成式原型锚定于真实行为，我们引入行为校准目标，显式纳入实际交互中的行为信号，调整嵌入空间结构以反映经验模式。GenAIR在保持高效率的同时，能无缝集成至现有大多数模型。基于三个真实数据集的综合实验表明，GenAIR能显著提升多种序列推荐模型的性能，并持续优于当前最优基线方法。实现代码已开源至 https://github.com/AI-Santiago/GenAIR。