Multimodal data plays a critical role in web-based recommendation systems, where information from diverse modalities such as vision and text enhances representation learning. However, real-world multimodal datasets often suffer from modality incompleteness due to sensor failures, annotation scarcity, or privacy constraints, which substantially degrade model performance and reliability. One effective solution to address this issue is modality completion, which reconstructs missing features to provide modality-complete graphs for downstream tasks. Given a query node with missing multimodal features, existing modality completion methods typically infer information from the node itself or its neighbors to reconstruct the missing modality. However, these methods may overlook semantically relevant context in the graph, which contains valuable cues that are non-trivial to capture through simple methods like neighborhood aggregation. In this work, we propose GRE-MC, a Graph Retrieval-Enhanced Modality Completion framework, to overcome these limitations. By introducing a modality-aware subgraph retrieval mechanism, GRE-MC selects semantically relevant subgraphs from the entire graph, providing richer contextual information for completing missing modalities. Subsequently, a graph transformer jointly encodes the query node and the retrieved subgraph via global attention to complete the missing features, while a learnable sparse-routing codebook regularizes latent embeddings into compact bases for improved robustness. Extensive experiments on multimodal recommendation benchmarks demonstrate that GRE-MC consistently outperforms state-of-the-art methods, validating the effectiveness of subgraph retrieval and joint-encoding graph transformer for robust modality completion.
翻译:多模态数据在网络推荐系统中扮演着关键角色,来自视觉和文本等不同模态的信息能够增强表示学习。然而,由于传感器故障、标注稀缺或隐私限制等问题,真实场景中的多模态数据集经常面临模态不完整的情况,这严重降低了模型性能与可靠性。解决该问题的有效途径之一是模态补全,即重建缺失特征从而为下游任务提供模态完整的图结构。对于具有缺失多模态特征的查询节点,现有模态补全方法通常通过节点自身或其邻居信息来推断并重建缺失模态。但这些方法可能忽略图中语义相关的上下文信息,这些信息蕴含通过邻居聚合等简单方法难以捕获的重要线索。为此,本文提出GRE-MC框架——一种基于图检索增强的模态补全方法。通过引入模态感知的子图检索机制,GRE-MC从全图中选取语义相关的子图,为缺失模态补全提供更丰富的上下文信息。随后,图Transformer通过全局注意力联合编码查询节点与检索子图以补全缺失特征,同时可学习的稀疏路由码本将潜在嵌入规则化为紧凑基向量,从而提升模型鲁棒性。在多模态推荐基准上的大量实验表明,GRE-MC始终优于现有最优方法,验证了子图检索与联合编码图Transformer在鲁棒模态补全中的有效性。