Dense depth and surface normal predictors should possess the equivariant property to cropping-and-resizing -- cropping the input image should result in cropping the same output image. However, we find that state-of-the-art depth and normal predictors, despite having strong performances, surprisingly do not respect equivariance. The problem exists even when crop-and-resize data augmentation is employed during training. To remedy this, we propose an equivariant regularization technique, consisting of an averaging procedure and a self-consistency loss, to explicitly promote cropping-and-resizing equivariance in depth and normal networks. Our approach can be applied to both CNN and Transformer architectures, does not incur extra cost during testing, and notably improves the supervised and semi-supervised learning performance of dense predictors on Taskonomy tasks. Finally, finetuning with our loss on unlabeled images improves not only equivariance but also accuracy of state-of-the-art depth and normal predictors when evaluated on NYU-v2. GitHub link: https://github.com/mikuhatsune/equivariance
翻译:稠密深度与表面法向预测器应具备对裁剪-缩放的等变性——将输入图像裁剪后,应得到相应的裁剪后输出图像。然而,我们发现,尽管当前最先进的深度与法向预测器性能强大,却意外地不满足等变性要求。即使训练过程中使用了裁剪-缩放数据增强,该问题依然存在。为解决此问题,我们提出一种等变正则化技术,包含平均化步骤与自一致性损失,以显式促进深度与法向网络中裁剪-缩放的等变性。我们的方法适用于CNN与Transformer架构,测试阶段不增加额外成本,并显著提升Taskonomy任务中稠密预测器的有监督与半监督学习性能。最后,在未标注图像上使用我们的损失进行微调,不仅改善了等变性,还提升了在NYU-v2数据集上评估的最先进深度与法向预测器的准确度。GitHub链接:https://github.com/mikuhatsune/equivariance