Lack of texture often causes ambiguity in matching, and handling this issue is an important challenge in optical flow estimation. Some methods insert stacked transformer modules that allow the network to use global information of cost volume for estimation. But the global information aggregation often incurs serious memory and time costs during training and inference, which hinders model deployment. We draw inspiration from the traditional local region constraint and design the local similarity aggregation (LSA) and the shifted local similarity aggregation (SLSA). The aggregation for cost volume is implemented with lightweight modules that act on the feature maps. Experiments on the final pass of Sintel show the lower cost required for our approach while maintaining competitive performance.
翻译:纹理缺失常导致匹配歧义,解决该问题是光流估计的重要挑战。部分方法引入堆叠式Transformer模块,使网络能够利用代价体的全局信息进行估计。然而,全局信息聚合在训练和推理过程中会带来严重的内存与时间开销,阻碍模型部署。受传统局部区域约束的启发,我们设计了局部相似性聚合(LSA)和移位局部相似性聚合(SLSA)。通过作用于特征图的轻量级模块实现代价体的聚合。在Sintel最终通道上的实验表明,我们的方法在保持竞争性性能的同时,所需计算成本更低。