Multimodal fusion is crucial in joint decision-making systems for rendering holistic judgments. Since multimodal data changes in open environments, dynamic fusion has emerged and achieved remarkable progress in numerous applications. However, most existing dynamic multimodal fusion methods lack theoretical guarantees and easily fall into suboptimal problems, yielding unreliability and instability. To address this issue, we propose a Predictive Dynamic Fusion (PDF) framework for multimodal learning. We proceed to reveal the multimodal fusion from a generalization perspective and theoretically derive the predictable Collaborative Belief (Co-Belief) with Mono- and Holo-Confidence, which provably reduces the upper bound of generalization error. Accordingly, we further propose a relative calibration strategy to calibrate the predicted Co-Belief for potential uncertainty. Extensive experiments on multiple benchmarks confirm our superiority. Our code is available at https://github.com/Yinan-Xia/PDF.
翻译:多模态融合在联合决策系统中对于形成整体判断至关重要。由于多模态数据在开放环境中会发生变化,动态融合应运而生,并在众多应用中取得了显著进展。然而,现有的大多数动态多模态融合方法缺乏理论保证,容易陷入次优问题,导致不可靠和不稳定。为解决此问题,我们提出了一种用于多模态学习的预测动态融合(PDF)框架。我们从泛化角度揭示多模态融合,并从理论上推导出具有单模态置信度和整体置信度的可预测协作信念(Co-Belief),该方法可证明地降低了泛化误差的上界。据此,我们进一步提出了一种相对校准策略,以校准预测的协作信念中潜在的不确定性。在多个基准测试上的大量实验证实了我们的优越性。我们的代码可在 https://github.com/Yinan-Xia/PDF 获取。