DiffuVolume: Diffusion Model for Volume based Stereo Matching

Stereo matching is a significant part in many computer vision tasks and driving-based applications. Recently cost volume-based methods have achieved great success benefiting from the rich geometry information in paired images. However, the redundancy of cost volume also interferes with the model training and limits the performance. To construct a more precise cost volume, we pioneeringly apply the diffusion model to stereo matching. Our method, termed DiffuVolume, considers the diffusion model as a cost volume filter, which will recurrently remove the redundant information from the cost volume. Two main designs make our method not trivial. Firstly, to make the diffusion model more adaptive to stereo matching, we eschew the traditional manner of directly adding noise into the image but embed the diffusion model into a task-specific module. In this way, we outperform the traditional diffusion stereo matching method by 22% EPE improvement and 240 times inference acceleration. Secondly, DiffuVolume can be easily embedded into any volume-based stereo matching network with boost performance but slight parameters rise (only 2%). By adding the DiffuVolume into well-performed methods, we outperform all the published methods on Scene Flow, KITTI2012, KITTI2015 benchmarks and zero-shot generalization setting. It is worth mentioning that the proposed model ranks 1st on KITTI 2012 leader board, 2nd on KITTI 2015 leader board since 15, July 2023.

翻译：立体匹配是许多计算机视觉任务及驾驶应用中至关重要的组成部分。近年来，基于代价体（cost volume）的方法凭借配对图像中丰富的几何信息取得了巨大成功。然而，代价体的冗余性也会干扰模型训练并限制性能。为构建更精确的代价体，我们开创性地将扩散模型应用于立体匹配。所提出的方法名为DiffuVolume，它将扩散模型视为代价体滤波器，通过循环方式逐步剔除代价体中的冗余信息。两项核心设计使得本方法并非简单改进。首先，为增强扩散模型对立体匹配的适应性，我们摒弃了传统直接向图像添加噪声的方式，而是将扩散模型嵌入到任务专用模块中。通过这种方式，与传统的扩散立体匹配方法相比，我们实现了22%的端点误差(EPE)提升以及240倍的推理加速。其次，DiffuVolume可轻松嵌入任何基于体素的立体匹配网络，在仅增加极少量参数（仅2%）的同时提升性能。将DiffuVolume集成到高性能方法后，我们在Scene Flow、KITTI2012、KITTI2015基准测试以及零样本泛化设置上均超越了所有已发表方法。值得一提的是，自2023年7月15日起，所提模型在KITTI 2012排行榜上位列第一，在KITTI 2015排行榜上位列第二。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日