Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases. Therefore, we propose a Retrieval-Augmented Discrete Diffusion (RADD) framework to decouple retrieve and reranking for MMKGC. A relation-aware multimodal KGE retriever serves as both global retriever and distillation teacher, while a conditional discrete denoiser performs shortlist-level entity-identity generation for reranking. Training combines KGE supervision, denoising cross-entropy, and temperature-scaled distillation from the retriever to the denoiser. At inference, the designed Diff-Rerank first forms a top-$K$ shortlist with the retriever and then reranks it with the denoiser, ensuring that recall is a strict prerequisite for precision. Experiments on three MMKGC benchmarks show that RADD achieves the best performance and consistent gains over strong unimodal, multimodal, and LLM-based baselines, while ablations further verify the contribution of each component.
翻译:多数多模态知识图谱补全(MMKGC)模型使用单一嵌入评分器同时完成全实体集检索与最终决策。我们认为这种耦合是核心瓶颈:全局高召回搜索与局部细粒度消歧需要不同的归纳偏置。为此,我们提出检索增强离散扩散(RADD)框架,将MMKGC中的检索与重排序过程解耦。一种关系感知的多模态KGE检索器同时作为全局检索器和蒸馏教师,而条件离散去噪器则执行候选列表级别的实体身份生成以实现重排序。训练过程融合了KGE监督、去噪交叉熵损失以及从检索器到去噪器的温度缩放蒸馏。在推理阶段,所设计的Diff-Rerank机制首先利用检索器构建top-$K$候选列表,再通过去噪器进行重排序,确保召回率是精确率的严格先决条件。在三个MMKGC基准上的实验表明,RADD取得最优性能,并在强单模态、多模态及基于LLM的基线方法上实现一致提升,消融实验进一步验证了各组件的贡献。