Despite the growing demand for accurate surface normal estimation models, existing methods use general-purpose dense prediction models, adopting the same inductive biases as other tasks. In this paper, we discuss the inductive biases needed for surface normal estimation and propose to (1) utilize the per-pixel ray direction and (2) encode the relationship between neighboring surface normals by learning their relative rotation. The proposed method can generate crisp - yet, piecewise smooth - predictions for challenging in-the-wild images of arbitrary resolution and aspect ratio. Compared to a recent ViT-based state-of-the-art model, our method shows a stronger generalization ability, despite being trained on an orders of magnitude smaller dataset. The code is available at https://github.com/baegwangbin/DSINE.
翻译:尽管对精确表面法线估计模型的需求日益增长,现有方法仍采用通用密集预测模型,沿用与其他任务相同的归纳偏置。本文探讨了表面法线估计所需的归纳偏置,并提出:(1)利用逐像素光线方向;(2)通过学习相邻表面法线的相对旋转来编码其相互关系。对于任意分辨率和纵横比的野外图像,所提方法能生成清晰且分段平滑的预测结果。与近期基于ViT的最先进模型相比,尽管我们的方法在规模小数个数量级的数据集上训练,却展现出更强的泛化能力。代码已开源至https://github.com/baegwangbin/DSINE。