Multimodal recommendation aims to recommend user-preferred candidates based on her/his historically interacted items and associated multimodal information. Previous studies commonly employ an embed-and-retrieve paradigm: learning user and item representations in the same embedding space, then retrieving similar candidate items for a user via embedding inner product. However, this paradigm suffers from inference cost, interaction modeling, and false-negative issues. Toward this end, we propose a new MMGRec model to introduce a generative paradigm into multimodal recommendation. Specifically, we first devise a hierarchical quantization method Graph RQ-VAE to assign Rec-ID for each item from its multimodal and CF information. Consisting of a tuple of semantically meaningful tokens, Rec-ID serves as the unique identifier of each item. Afterward, we train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences. The generative paradigm is qualified since this model systematically predicts the tuple of tokens identifying the recommended item in an autoregressive manner. Moreover, a relation-aware self-attention mechanism is devised for the Transformer to handle non-sequential interaction sequences, which explores the element pairwise relation to replace absolute positional encoding. Extensive experiments evaluate MMGRec's effectiveness compared with state-of-the-art methods.
翻译:多模态推荐旨在基于用户历史交互项目及其关联的多模态信息,推荐用户偏好的候选项目。先前研究通常采用嵌入-检索范式:在同一嵌入空间中学习用户与项目表示,随后通过嵌入内积为用户检索相似候选项目。然而,该范式存在推理成本高、交互建模不充分及假阴性问题。为此,我们提出新型MMGRec模型,将生成式范式引入多模态推荐领域。具体而言,我们首先设计分层量化方法Graph RQ-VAE,从项目的多模态信息与协同过滤信息中为其分配Rec-ID。Rec-ID由语义化标记元组构成,作为每个项目的唯一标识符。随后,我们训练基于Transformer的推荐器,使其能够根据历史交互序列生成用户偏好项目的Rec-ID。该生成式范式具有理论合理性,因为模型以自回归方式系统化预测标识推荐项目的标记元组。此外,我们为Transformer设计了关系感知自注意力机制以处理非序列化交互序列,该机制通过探索元素对间关系来替代绝对位置编码。大量实验证明,相较于现有最先进方法,MMGRec具有显著优越性。