Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible and occluded regions of the scene. To facilitate research on this new task, we extend the AmodalSynthDrive dataset to include pixel-level labels for amodal optical flow estimation. We present several strong baselines, along with the Amodal Flow Quality metric to quantify the performance in an interpretable manner. Furthermore, we propose the novel AmodalFlowNet as an initial step toward addressing this task. AmodalFlowNet consists of a transformer-based cost-volume encoder paired with a recurrent transformer decoder which facilitates recurrent hierarchical feature propagation and amodal semantic grounding. We demonstrate the tractability of amodal optical flow in extensive experiments and show its utility for downstream tasks such as panoptic tracking. We make the dataset, code, and trained models publicly available at http://amodal-flow.cs.uni-freiburg.de.
翻译:光流估计在存在透明或遮挡物体的场景中极具挑战性。本研究通过引入非模态光流(Amodal Optical Flow),从任务层面整合光流与非模态感知以应对这些挑战。我们定义非模态光流为一种多层级像素运动场,其不仅表征可见区域,更涵盖场景中的遮挡区域。为促进该新任务的研究,我们扩展了AmodalSynthDrive数据集,新增像素级非模态光流估计标注。我们提出了多个强基准模型,并设计非模态流质量度量(Amodal Flow Quality)以可解释的方式量化性能。此外,我们创新性地提出AmodalFlowNet作为探索该任务的初步方案。该网络采用基于Transformer的代价体编码器与循环Transformer解码器相结合的结构,支持循环层次化特征传播与非模态语义定锚。通过大量实验验证了非模态光流的可行性,并展示了其在全景跟踪等下游任务中的实用性。数据集、代码及预训练模型已开源发布至http://amodal-flow.cs.uni-freiburg.de。