SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment Anything Model

Optical flow estimation aims to find the 2D dense motion field between two frames. Due to the limitation of model structures and training datasets, existing methods often rely too much on local clues and ignore the integrity of objects, resulting in fragmented motion estimation. We notice that the recently famous Segment Anything Model (SAM) demonstrates a strong ability to segment complete objects, which is suitable for solving the fragmentation problem in optical flow estimation. We thus propose a solution to embed the frozen SAM image encoder into FlowFormer to enhance object perception. To address the challenge of in-depth utilizing SAM in non-segmentation tasks like optical flow estimation, we propose an Optical Flow Task-Specific Adaption scheme, including a Context Fusion Module to fuse the SAM encoder with the optical flow context encoder, and a Context Adaption Module to adapt the SAM features for optical flow task with Learned Task-Specific Embedding. Our proposed SAMFlow model reaches 0.86/2.10 clean/final EPE and 3.55/12.32 EPE/F1-all on Sintel and KITTI-15 training set, surpassing Flowformer by 8.5%/9.9% and 13.2%/16.3%. Furthermore, our model achieves state-of-the-art performance on the Sintel and KITTI-15 benchmarks, ranking #1 among all two-frame methods on Sintel clean pass.

翻译：光流估计旨在寻找两帧之间的二维密集运动场。由于模型结构和训练数据集的限制，现有方法往往过度依赖局部线索，忽略物体的完整性，导致运动估计碎片化。我们注意到，近期知名的分割一切模型（SAM）展现出强大的完整物体分割能力，这为解决光流估计中的碎片化问题提供了可能。为此，我们提出将冻结的SAM图像编码器嵌入FlowFormer中以增强物体感知能力。针对SAM在非分割任务（如光流估计）中深度应用的挑战，我们提出了光流任务特定适配方案，包括用于融合SAM编码器与光流上下文编码器的上下文融合模块，以及通过可学习的任务特定嵌入将SAM特征适配至光流任务的上下文适配模块。所提出的SAMFlow模型在Sintel和KITTI-15训练集上分别达到0.86/2.10的clean/final端点误差（EPE）及3.55/12.32的EPE/F1-all，相比Flowformer提升8.5%/9.9%和13.2%/16.3%。此外，我们的模型在Sintel和KITTI-15基准测试中达到最优性能，在Sintel clean通道所有双帧方法中排名第一。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日