The inherent challenge of image fusion lies in capturing the correlation of multi-source images and comprehensively integrating effective information from different sources. Most existing techniques fail to perform dynamic image fusion while notably lacking theoretical guarantees, leading to potential deployment risks in this field. Is it possible to conduct dynamic image fusion with a clear theoretical justification? In this paper, we give our solution from a generalization perspective. We proceed to reveal the generalized form of image fusion and derive a new test-time dynamic image fusion paradigm. It provably reduces the upper bound of generalization error. Specifically, we decompose the fused image into multiple components corresponding to its source data. The decomposed components represent the effective information from the source data, thus the gap between them reflects the Relative Dominability (RD) of the uni-source data in constructing the fusion image. Theoretically, we prove that the key to reducing generalization error hinges on the negative correlation between the RD-based fusion weight and the uni-source reconstruction loss. Intuitively, RD dynamically highlights the dominant regions of each source and can be naturally converted to the corresponding fusion weight, achieving robust results. Extensive experiments and discussions with in-depth analysis on multiple benchmarks confirm our findings and superiority. Our code is available at https://github.com/Yinan-Xia/TTD.
翻译:图像融合的内在挑战在于捕捉多源图像之间的相关性,并全面整合来自不同源的有效信息。现有的大多数技术无法实现动态图像融合,且明显缺乏理论保证,导致该领域存在潜在的部署风险。是否可能以明确的理论依据进行动态图像融合?本文从泛化的角度给出了我们的解决方案。我们进一步揭示了图像融合的广义形式,并推导出一种新的测试时动态图像融合范式。该范式可证明地降低了泛化误差的上界。具体而言,我们将融合图像分解为与其源数据对应的多个分量。分解后的分量代表了源数据中的有效信息,因此它们之间的差距反映了单源数据在构建融合图像时的相对主导性。理论上,我们证明了降低泛化误差的关键在于基于相对主导性的融合权重与单源重建损失之间的负相关性。直观上,相对主导性动态地突出了每个源的主导区域,并可以自然地转换为相应的融合权重,从而实现鲁棒的结果。在多个基准测试上进行的广泛实验、讨论及深入分析证实了我们的发现和方法的优越性。我们的代码可在 https://github.com/Yinan-Xia/TTD 获取。