Single-image 3D generation with part-level structure remains challenging: learned priors struggle to cover the long tail of part geometries and maintain multi-view consistency, and existing systems provide limited support for precise, localized edits. We present PartRAG, a retrieval-augmented framework that integrates an external part database with a diffusion transformer to couple generation with an editable representation. To overcome the first challenge, we introduce a Hierarchical Contrastive Retrieval module that aligns dense image patches with 3D part latents at both part and object granularity, retrieving from a curated bank of 1,236 part-annotated assets to inject diverse, physically plausible exemplars into denoising. To overcome the second challenge, we add a masked, part-level editor that operates in a shared canonical space, enabling swaps, attribute refinements, and compositional updates without regenerating the whole object while preserving non-target parts and multi-view consistency. PartRAG achieves competitive results on Objaverse, ShapeNet, and ABO-reducing Chamfer Distance from 0.1726 to 0.1528 and raising F-Score from 0.7472 to 0.844 on Objaverse-with inference of 38s and interactive edits in 5-8s. Qualitatively, PartRAG produces sharper part boundaries, better thin-structure fidelity, and robust behavior on articulated objects. Code: https://github.com/AIGeeksGroup/PartRAG. Website: https://aigeeksgroup.github.io/PartRAG.
翻译:基于单张图像的部件级结构三维生成仍面临挑战:学习到的先验难以覆盖部件几何的长尾分布并保持多视角一致性,而现有系统对精确局部编辑的支持有限。本文提出PartRAG,一种检索增强框架,通过将外部部件数据库与扩散Transformer相结合,实现生成过程与可编辑表征的耦合。针对首个挑战,我们设计了分层对比检索模块,在部件和物体双重粒度上对齐密集图像块与三维部件潜在表示,从包含1,236个部件标注资产的精选库中检索,从而将多样化且物理合理的范例注入去噪过程。针对第二个挑战,我们增加了在共享规范空间中操作的掩码部件级编辑器,支持部件替换、属性优化和组合更新,无需重新生成整个物体即可保持非目标部件及多视角一致性。PartRAG在Objaverse、ShapeNet和ABO数据集上取得具有竞争力的结果——在Objaverse上将倒角距离从0.1726降至0.1528,F分数从0.7472提升至0.844,推理耗时38秒,交互式编辑仅需5-8秒。定性结果表明,PartRAG能生成更清晰的部件边界、更优的薄壁结构保真度,并在铰接物体上表现稳健。代码:https://github.com/AIGeeksGroup/PartRAG。项目网站:https://aigeeksgroup.github.io/PartRAG。