Lack of texture often causes ambiguity in matching, and handling this issue is an important challenge in optical flow estimation tasks. Some methods insert stacked transformer modules that allow the network to use global information of cost volume for estimation. But the global information aggregation often incurs serious memory and time costs during training and inference, which hinders model deployment. We draw inspiration from the traditional local region constraint and design the local similarity aggregation (LSA) and the shifted local similarity aggregation (SLSA). The aggregation for cost volume is implemented with lightweight modules that act on the feature maps. Experiments on the final pass of Sintel show the lower cost required for our approach while maintaining competitive performance.
翻译:纹理缺失常导致匹配模糊,处理该问题是光流估计任务中的重要挑战。部分方法通过插入堆叠的Transformer模块,使网络能够利用成本体积的全局信息进行估计。但全局信息聚合在训练和推理过程中会带来严重的内存与时间开销,阻碍模型部署。受传统局部区域约束启发,我们设计了局部相似性聚合(LSA)与移位局部相似性聚合(SLSA)。通过作用于特征图的轻量级模块实现成本体积的聚合。在Sintel最终榜单上的实验表明,本方法在保持竞争性能的同时具有更低的计算成本。