Reconstructing category-specific objects from a single image is a challenging task that requires inferring the geometry and appearance of an object from a limited viewpoint. Existing methods typically rely on local feature retrieval based on re-projection with known camera intrinsic, which are slow and prone to distortion at viewpoints distant from the input image. In this paper, we present Variable Radiance Field (VRF), a novel framework that can efficiently reconstruct category-specific objects from a single image without known camera parameters. Our key contributions are: (1) We parameterize the geometry and appearance of the object using a multi-scale global feature extractor, which avoids frequent point-wise feature retrieval and camera dependency. We also propose a contrastive learning-based pretraining strategy to improve the feature extractor. (2) We reduce the geometric complexity of the object by learning a category template, and use hypernetworks to generate a small neural radiance field for fast and instance-specific rendering. (3) We align each training instance to the template space using a learned similarity transformation, which enables semantic-consistent learning across different objects. We evaluate our method on the CO3D dataset and show that it outperforms existing methods in terms of quality and speed. We also demonstrate its applicability to shape interpolation and object placement tasks.
翻译:从单张图像重建特定类别对象是一项具有挑战性的任务,需要从有限视角推断对象的几何与外观。现有方法通常依赖基于已知相机内参的重投影进行局部特征检索,这些方法在输入图像视角距离较远的视点上速度缓慢且易产生畸变。本文提出可变辐射场(VRF)这一新型框架,无需已知相机参数即可高效地从单张图像重建特定类别对象。我们的核心贡献包括:(1) 采用多尺度全局特征提取器对对象的几何与外观进行参数化,避免了频繁的逐点特征检索和相机依赖性;同时提出基于对比学习的预训练策略以提升特征提取器性能。(2) 通过学习类别模板降低对象的几何复杂度,并利用超网络生成小型神经辐射场,实现快速且实例特定的渲染。(3) 利用可学习的相似变换将每个训练实例对齐到模板空间,从而实现跨对象的语义一致性学习。我们在CO3D数据集上评估了该方法,结果表明其在质量和速度上均优于现有方法,并展示了该方法在形状插值和对象放置任务中的适用性。