Generalizable neural surface reconstruction techniques have attracted great attention in recent years. However, they encounter limitations of low confidence depth distribution and inaccurate surface reasoning due to the oversimplified volume rendering process employed. In this paper, we present Reconstruction TRansformer (ReTR), a novel framework that leverages the transformer architecture to redesign the rendering process, enabling complex render interaction modeling. It introduces a learnable $\textit{meta-ray token}$ and utilizes the cross-attention mechanism to simulate the interaction of rendering process with sampled points and render the observed color. Meanwhile, by operating within a high-dimensional feature space rather than the color space, ReTR mitigates sensitivity to projected colors in source views. Such improvements result in accurate surface assessment with high confidence. We demonstrate the effectiveness of our approach on various datasets, showcasing how our method outperforms the current state-of-the-art approaches in terms of reconstruction quality and generalization ability. $\textit{Our code is available at }$ https://github.com/YixunLiang/ReTR.
翻译:可泛化神经表面重建技术近年来吸引了广泛关注。然而,由于采用过度简化的体渲染过程,这些方法面临深度分布置信度低以及表面推理不准确等局限性。本文提出Reconstruction TRansformer(ReTR)——一种利用Transformer架构重新设计渲染过程、实现复杂渲染交互建模的新框架。该框架引入可学习的$\textit{元射线标记}$(meta-ray token),通过交叉注意力机制模拟渲染过程与采样点的交互,并渲染观测颜色。同时,ReTR在高维特征空间而非颜色空间中运行,从而降低对源视图中投影颜色的敏感性。这些改进使得表面评估具有高置信度且更加精确。我们在多数据集上验证了本方法的有效性,展示了其在重建质量和泛化能力上均优于当前最先进方法。$\textit{代码已开源至:}$https://github.com/YixunLiang/ReTR。