We present Chirpy3D, a novel approach for fine-grained 3D object generation, tackling the challenging task of synthesizing creative 3D objects in a zero-shot setting, with access only to unposed 2D images of seen categories. Without structured supervision -- such as camera poses, 3D part annotations, or object-specific labels -- the model must infer plausible 3D structures, capture fine-grained details, and generalize to novel objects using only category-level labels from seen categories. To address this, Chirpy3D introduces a multi-view diffusion model that decomposes training objects into anchor parts in an unsupervised manner, representing the latent space of both seen and unseen parts as continuous distributions. This allows smooth interpolation and flexible recombination of parts to generate entirely new objects with species-specific details. A self-supervised feature consistency loss further ensures structural and semantic coherence. The result is the first system capable of generating entirely novel 3D objects with species-specific fine-grained details through flexible part sampling and composition. Our experiments demonstrate that Chirpy3D surpasses existing methods in generating creative 3D objects with higher quality and fine-grained details. Code will be released at https://github.com/kamwoh/chirpy3d.
翻译:我们提出了Chirpy3D,一种用于细粒度三维物体生成的新方法,旨在解决在零样本设置下合成创造性三维物体的挑战性任务,且仅能访问已见类别的无姿态二维图像。在没有结构化监督(如相机姿态、三维部件标注或物体特定标签)的情况下,模型必须仅利用已见类别的类别级标签,推断出合理的三维结构、捕捉细粒度细节,并泛化到新物体。为此,Chirpy3D引入了一个多视角扩散模型,以无监督方式将训练物体分解为锚定部件,并将已见和未见部件的潜在空间表示为连续分布。这使得部件能够平滑插值和灵活重组,从而生成具有物种特定细节的全新物体。自监督特征一致性损失进一步确保了结构和语义的连贯性。其结果是首个能够通过灵活的部件采样与组合,生成具有物种特异性细粒度细节的全新三维物体的系统。我们的实验表明,Chirpy3D在生成具有更高质量和细粒度细节的创造性三维物体方面超越了现有方法。代码将在 https://github.com/kamwoh/chirpy3d 发布。