Graph neural networks (GNNs) have revolutionized recommender systems by effectively modeling complex user-item interactions, yet data sparsity and the item cold-start problem significantly impair performance, particularly for new items with limited or no interaction history. While multimodal content offers a promising solution, existing methods result in suboptimal representations for new items due to noise and entanglement in sparse data. To address this, we transform multimodal recommendation into discrete semantic tokenization. We present Sparse-Regularized Multimodal Tokenization for Cold-Start Recommendation (MoToRec), a framework centered on a sparsely-regularized Residual Quantized Variational Autoencoder (RQ-VAE) that generates a compositional semantic code of discrete, interpretable tokens, promoting disentangled representations. MoToRec's architecture is enhanced by three synergistic components: (1) a sparsely-regularized RQ-VAE that promotes disentangled representations, (2) a novel adaptive rarity amplification that promotes prioritized learning for cold-start items, and (3) a hierarchical multi-source graph encoder for robust signal fusion with collaborative signals. Extensive experiments on three large-scale datasets demonstrate MoToRec's superiority over state-of-the-art methods in both overall and cold-start scenarios. Our work validates that discrete tokenization provides an effective and scalable alternative for mitigating the long-standing cold-start challenge.
翻译:图神经网络通过有效建模复杂的用户-物品交互,彻底改变了推荐系统,然而数据稀疏性和物品冷启动问题严重影响了性能,特别是对于交互历史有限或缺失的新物品。虽然多模态内容提供了有前景的解决方案,但由于稀疏数据中的噪声和纠缠,现有方法对新物品的表征仍不理想。为解决这一问题,我们将多模态推荐转化为离散语义分词任务。本文提出面向冷启动推荐的稀疏正则化多模态分词方法(MoToRec),该框架以稀疏正则化的残差量化变分自编码器为核心,通过生成由离散、可解释词元组成的组合语义编码,促进解耦表征。MoToRec的架构通过三个协同组件得到增强:(1)促进解耦表征的稀疏正则化残差量化变分自编码器,(2)促进冷启动物品优先学习的新型自适应稀有度放大机制,(3)用于与协同信号进行鲁棒融合的分层多源图编码器。在三个大规模数据集上的大量实验表明,MoToRec在整体性能和冷启动场景下均优于现有最先进方法。我们的工作验证了离散分词为解决长期存在的冷启动挑战提供了有效且可扩展的替代方案。