Recently, Google proposes DDVM which for the first time demonstrates that a general diffusion model for image-to-image translation task works impressively well on optical flow estimation task without any specific designs like RAFT. However, DDVM is still a closed-source model with the expensive and private Palette-style pretraining. In this technical report, we present the first open-source DDVM by reproducing it. We study several design choices and find those important ones. By training on 40k public data with 4 GPUs, our reproduction achieves comparable performance to the closed-source DDVM. The code and model have been released in https://github.com/DQiaole/FlowDiffusion_pytorch.
翻译:近日,谷歌提出DDVM模型,首次证明面向图像到图像翻译任务的通用扩散模型在不依赖RAFT等特定设计的情况下,也能在光流估计任务上取得显著效果。然而,DDVM仍是闭源模型,其Palette式预训练代价高昂且未公开。本技术报告中,我们通过复现工作首次实现了开源DDVM。我们研究了若干设计选择,并确定了关键设计要素。在4块GPU上使用4万公开数据训练后,我们的复现模型达到了与闭源DDVM相当的性能。相关代码与模型已发布于https://github.com/DQiaole/FlowDiffusion_pytorch。