In this paper, we present an approach for monocular open-set novel view synthesis (NVS) that leverages object skeletons to guide the underlying diffusion model. Building upon a baseline that utilizes a pre-trained 2D image generator, our method takes advantage of the Objaverse dataset, which includes animated objects with bone structures. By introducing a skeleton guide layer following the existing ray conditioning normalization (RCN) layer, our approach enhances pose accuracy and multi-view consistency. The skeleton guide layer provides detailed structural information for the generative model, improving the quality of synthesized views. Experimental results demonstrate that our skeleton-guided method significantly enhances consistency and accuracy across diverse object categories within the Objaverse dataset. Our method outperforms existing state-of-the-art NVS techniques both quantitatively and qualitatively, without relying on explicit 3D representations.
翻译:本文提出了一种用于单目开放集新视角合成的方案,该方法利用物体骨架来引导底层扩散模型。我们的方法基于一个利用预训练二维图像生成器的基线,并利用了包含带骨骼结构的动画物体的Objaverse数据集。通过在现有的射线条件归一化层之后引入一个骨架引导层,我们的方法提升了姿态精度与多视角一致性。该骨架引导层为生成模型提供了详细的结构信息,从而提高了合成视角的质量。实验结果表明,我们的骨架引导方法显著增强了Objaverse数据集中不同物体类别间的一致性与准确性。我们的方法在定量与定性评估上均优于现有的先进新视角合成技术,且无需依赖显式的三维表示。