Encoding information from 2D views of an object into a 3D representation is crucial for generalized 3D feature extraction. Such features can then enable 3D reconstruction, 3D generation, and other applications. We propose GOEmbed (Gradient Origin Embeddings) that encodes input 2D images into any 3D representation, without requiring a pre-trained image feature extractor; unlike typical prior approaches in which input images are either encoded using 2D features extracted from large pre-trained models, or customized features are designed to handle different 3D representations; or worse, encoders may not yet be available for specialized 3D neural representations such as MLPs and hash-grids. We extensively evaluate our proposed GOEmbed under different experimental settings on the OmniObject3D benchmark. First, we evaluate how well the mechanism compares against prior encoding mechanisms on multiple 3D representations using an illustrative experiment called Plenoptic-Encoding. Second, the efficacy of the GOEmbed mechanism is further demonstrated by achieving a new SOTA FID of 22.12 on the OmniObject3D generation task using a combination of GOEmbed and DFM (Diffusion with Forward Models), which we call GOEmbedFusion. Finally, we evaluate how the GOEmbed mechanism bolsters sparse-view 3D reconstruction pipelines.
翻译:将物体的二维视图信息编码为三维表示对于广义三维特征提取至关重要。此类特征随后可实现三维重建、三维生成及其他应用。我们提出GOEmbed(梯度原点嵌入),该方法能将输入的二维图像编码为任意三维表示,而无需预训练的图像特征提取器;这与典型的现有方法不同,后者要么使用从大型预训练模型中提取的二维特征对输入图像进行编码,要么需要针对不同三维表示设计定制化特征;更甚者,对于MLP和哈希网格等专用三维神经表示,可能尚无可用的编码器。我们在OmniObject3D基准上通过不同实验设置对所提出的GOEmbed进行了全面评估。首先,我们通过一项名为"全光编码"的示例性实验,评估该机制在多种三维表示上相较于现有编码机制的性能表现。其次,通过将GOEmbed与DFM(前向模型扩散)相结合(我们称之为GOEmbedFusion),在OmniObject3D生成任务中实现了22.12的新SOTA FID分数,进一步证明了GOEmbed机制的有效性。最后,我们评估了GOEmbed机制如何增强稀疏视图三维重建流程。