Generalizable NeRF aims to synthesize novel views for unseen scenes. Common practices involve constructing variance-based cost volumes for geometry reconstruction and encoding 3D descriptors for decoding novel views. However, existing methods show limited generalization ability in challenging conditions due to inaccurate geometry, sub-optimal descriptors, and decoding strategies. We address these issues point by point. First, we find the variance-based cost volume exhibits failure patterns as the features of pixels corresponding to the same point can be inconsistent across different views due to occlusions or reflections. We introduce an Adaptive Cost Aggregation (ACA) approach to amplify the contribution of consistent pixel pairs and suppress inconsistent ones. Unlike previous methods that solely fuse 2D features into descriptors, our approach introduces a Spatial-View Aggregator (SVA) to incorporate 3D context into descriptors through spatial and inter-view interaction. When decoding the descriptors, we observe the two existing decoding strategies excel in different areas, which are complementary. A Consistency-Aware Fusion (CAF) strategy is proposed to leverage the advantages of both. We incorporate the above ACA, SVA, and CAF into a coarse-to-fine framework, termed Geometry-aware Reconstruction and Fusion-refined Rendering (GeFu). GeFu attains state-of-the-art performance across multiple datasets. Code is available at https://github.com/TQTQliu/GeFu .
翻译:泛化NeRF旨在为未见场景合成新颖视角。常见方法包括构建基于方差的代价体积进行几何重建,以及编码三维描述符用于解码新颖视角。然而,现有方法在复杂条件下因几何重建不准确、描述符次优及解码策略局限,泛化能力受限。我们逐点解决这些问题。首先,发现基于方差的代价体积存在失效模式:由于遮挡或反射,同一空间点对应的跨视角像素特征可能不一致。我们提出自适应代价聚合(ACA)方法,增强一致像素对的贡献并抑制不一致像素对。与以往仅将二维特征融合为描述符的方法不同,我们引入空间-视角聚合器(SVA),通过空间及跨视角交互将三维上下文融入描述符。在解码描述符时,观察到两种现有解码策略各有所长且具有互补性,据此提出一致性感知融合(CAF)策略以兼顾两者优势。我们将ACA、SVA及CAF整合至粗到精框架中,称为几何感知重建与融合精炼渲染(GeFu)。GeFu在多个数据集上达到最先进性能。代码开源于https://github.com/TQTQliu/GeFu。