Despite significant progress in deep learning-based optical flow methods, accurately estimating large displacements and repetitive patterns remains a challenge. The limitations of local features and similarity search patterns used in these algorithms contribute to this issue. Additionally, some existing methods suffer from slow runtime and excessive graphic memory consumption. To address these problems, this paper proposes a novel approach based on the RAFT framework. The proposed Attention-based Feature Localization (AFL) approach incorporates the attention mechanism to handle global feature extraction and address repetitive patterns. It introduces an operator for matching pixels with corresponding counterparts in the second frame and assigning accurate flow values. Furthermore, an Amorphous Lookup Operator (ALO) is proposed to enhance convergence speed and improve RAFTs ability to handle large displacements by reducing data redundancy in its search operator and expanding the search space for similarity extraction. The proposed method, Efficient RAFT (Ef-RAFT),achieves significant improvements of 10% on the Sintel dataset and 5% on the KITTI dataset over RAFT. Remarkably, these enhancements are attained with a modest 33% reduction in speed and a mere 13% increase in memory usage. The code is available at: https://github.com/n3slami/Ef-RAFT
翻译:尽管基于深度学习的光流方法取得了显著进展,但准确估计大位移和重复图案仍是一项挑战。这些算法中使用的局部特征和相似性搜索模式的局限性导致了这一问题。此外,现有的一些方法存在运行速度慢和显存消耗过大的问题。为了解决这些问题,本文提出了一种基于RAFT框架的新方法。所提出的基于注意力的特征定位(AFL)方法引入了注意力机制,用于处理全局特征提取并解决重复图案问题。它引入了一个算子,用于将像素与第二帧中的对应像素进行匹配并分配精确的光流值。此外,还提出了一种无定形查找算子(ALO),通过减少搜索算子中的数据冗余并扩大相似性提取的搜索空间,来提高收敛速度并增强RAFT处理大位移的能力。所提出的方法——高效RAFT(Ef-RAFT),在Sintel数据集上相比RAFT实现了10%的显著提升,在KITTI数据集上实现了5%的提升。值得注意的是,这些改进仅以速度降低33%和内存使用增加13%为代价。代码可在以下网址获取:https://github.com/n3slami/Ef-RAFT