Most recent works on optical flow use convex upsampling as the last step to obtain high-resolution flow. In this work, we show and discuss several issues and limitations of this currently widely adopted convex upsampling approach. We propose a series of changes, in an attempt to resolve current issues. First, we propose to decouple the weights for the final convex upsampler, making it easier to find the correct convex combination. For the same reason, we also provide extra contextual features to the convex upsampler. Then, we increase the convex mask size by using an attention-based alternative convex upsampler; Transformers for Convex Upsampling. This upsampler is based on the observation that convex upsampling can be reformulated as attention, and we propose to use local attention masks as a drop-in replacement for convex masks to increase the mask size. We provide empirical evidence that a larger mask size increases the likelihood of the existence of the convex combination. Lastly, we propose an alternative training scheme to remove bilinear interpolation artifacts from the model output. Our proposed ideas could theoretically be applied to almost every current state-of-the-art optical flow architecture. On the FlyingChairs + FlyingThings3D training setting we reduce the Sintel Clean training end-point-error of RAFT from 1.42 to 1.26, GMA from 1.31 to 1.18, and that of FlowFormer from 0.94 to 0.90, by solely adapting the convex upsampler.
翻译:当前多数光流研究工作采用凸上采样作为获取高分辨率光流的最终步骤。本文揭示并探讨了这种目前被广泛采用的凸上采样方法存在的若干问题与局限性。我们提出了一系列改进方案以解决现有问题。首先,我们建议解耦最终凸上采样器的权重,使其更容易找到正确的凸组合。出于相同目的,我们还为凸上采样器提供了额外的上下文特征。随后,我们通过采用基于注意力的替代凸上采样器——Transformer凸上采样器——来扩大凸掩模尺寸。该上采样器的设计基于以下观察:凸上采样可被重新表述为注意力机制,我们提出使用局部注意力掩模作为凸掩模的即插即用替代方案以扩大掩模尺寸。我们通过实验证明,更大的掩模尺寸能提高凸组合存在的可能性。最后,我们提出一种替代训练方案以消除模型输出中的双线性插值伪影。我们提出的改进思路理论上可应用于几乎所有当前最先进的光流架构。在FlyingChairs + FlyingThings3D训练设置下,仅通过改进凸上采样器,我们成功将RAFT在Sintel Clean训练集上的终点误差从1.42降至1.26,GMA从1.31降至1.18,FlowFormer从0.94降至0.90。