StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the estimation process. Our method, StableNormal, mitigates the stochasticity of the diffusion process by reducing inference variance, thus producing "Stable-and-Sharp" normal estimates without any additional ensembling process. StableNormal works robustly under challenging imaging conditions, such as extreme lighting, blurring, and low quality. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarse-to-fine strategy, which starts with a one-step normal estimator (YOSO) to derive an initial normal guess, that is relatively coarse but reliable, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement. These results evidence that StableNormal retains both the "stability" and "sharpness" for accurate normal estimation. StableNormal represents a baby attempt to repurpose diffusion priors for deterministic estimation. To democratize this, code and models have been publicly available in hf.co/Stable-X

翻译：本研究致力于解决从单目彩色输入（即图像和视频）中高质量估计表面法线的挑战，该领域近期因扩散先验的重新利用而发生了革命性变化。然而，先前方法仍受困于随机推理过程（这与Image2Normal任务的确定性本质相冲突）以及耗时的集成步骤，从而拖慢了估计速度。我们的方法StableNormal通过降低推理方差来缓解扩散过程的随机性，从而无需任何额外集成步骤即可生成“稳定且锐利”的法线估计。StableNormal在极端光照、模糊、低质量等挑战性成像条件下表现稳健，同时对透明与反射表面以及包含大量物体的杂乱场景也具有鲁棒性。具体而言，StableNormal采用由粗到精的策略：首先通过一步法线估计器（YOSO）获得相对粗糙但可靠的初始法线猜测，随后通过语义引导的精细化过程（SG-DRN）对法线进行细化以恢复几何细节。StableNormal的有效性通过在DIODE-indoor、iBims、ScannetV2和NYUv2等标准数据集上的竞争性表现，以及在表面重建、法线增强等多种下游任务中的优异性能得到验证。这些结果证明StableNormal在准确法线估计中同时保持了“稳定性”与“锐利度”。StableNormal是将扩散先验重新用于确定性估计的初步尝试。为促进其普及，代码与模型已公开于hf.co/Stable-X。