Remarkable progress has been made in 3D reconstruction from single-view RGB-D inputs. MCC is the current state-of-the-art method in this field, which achieves unprecedented success by combining vision Transformers with large-scale training. However, we identified two key limitations of MCC: 1) The Transformer decoder is inefficient in handling large number of query points; 2) The 3D representation struggles to recover high-fidelity details. In this paper, we propose a new approach called NU-MCC that addresses these limitations. NU-MCC includes two key innovations: a Neighborhood decoder and a Repulsive Unsigned Distance Function (Repulsive UDF). First, our Neighborhood decoder introduces center points as an efficient proxy of input visual features, allowing each query point to only attend to a small neighborhood. This design not only results in much faster inference speed but also enables the exploitation of finer-scale visual features for improved recovery of 3D textures. Second, our Repulsive UDF is a novel alternative to the occupancy field used in MCC, significantly improving the quality of 3D object reconstruction. Compared to standard UDFs that suffer from holes in results, our proposed Repulsive UDF can achieve more complete surface reconstruction. Experimental results demonstrate that NU-MCC is able to learn a strong 3D representation, significantly advancing the state of the art in single-view 3D reconstruction. Particularly, it outperforms MCC by 9.7% in terms of the F1-score on the CO3D-v2 dataset with more than 5x faster running speed.
翻译:在单视角RGB-D输入的三维重建领域已取得显著进展。MCC作为该领域当前最先进的方法,通过结合视觉Transformer与大规模训练取得了前所未有的成功。然而,我们发现了MCC的两个关键局限:1)Transformer解码器在处理大量查询点时效率低下;2)其三维表示难以恢复高保真细节。本文提出名为NU-MCC的新方法来解决这些问题。NU-MCC包含两项关键创新:邻域解码器和排斥无符号距离函数。首先,我们的邻域解码器引入中心点作为输入视觉特征的高效代理,使每个查询点仅需关注邻近区域。该设计不仅大幅提升推理速度,还能利用更细粒度的视觉特征改善三维纹理恢复。其次,我们提出的排斥无符号距离函数是对MCC中占用场的新型替代方案,显著提升了三维物体重建质量。针对标准无符号距离函数存在的空洞问题,本文提出的排斥UDF可实现更完整的曲面重建。实验结果表明,NU-MCC能够学习强三维表示,显著推进单视角三维重建领域的最新技术水平。特别地,在CO3D-v2数据集上,该方法相比MCC的F1分数提升9.7%,运行速度快5倍以上。