We propose a new contrastive objective for learning overcomplete pixel-level features that are invariant to motion blur. Other invariances (e.g., pose, illumination, or weather) can be learned by applying the corresponding transformations on unlabeled images during self-supervised training. We showcase that a simple U-Net trained with our objective can produce local features useful for aligning the frames of an unseen video captured with a moving camera under realistic and challenging conditions. Using a carefully designed toy example, we also show that the overcomplete pixels can encode the identity of objects in an image and the pixel coordinates relative to these objects.
翻译:我们提出了一种新的对比学习目标,用于学习对运动模糊具有不变性的过完备像素级特征。通过自监督训练过程中对无标签图像施加相应的变换,可以学习其他不变性(如姿态、光照或天气条件)。我们证明,使用我们提出的目标训练一个简单的U-Net,能够生成适用于在真实且具有挑战性的条件下,对移动相机拍摄的未知视频帧进行对齐的局部特征。通过精心设计的简化示例,我们还表明过完备像素能够编码图像中物体的身份信息以及相对于这些物体的像素坐标。