Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

Recent advances in 3D AIGC have shown promise in directly creating 3D objects from text and images, offering significant cost savings in animation and product design. However, detailed edit and customization of 3D assets remains a long-standing challenge. Specifically, 3D Generation methods lack the ability to follow finely detailed instructions as precisely as their 2D image creation counterparts. Imagine you can get a toy through 3D AIGC but with undesired accessories and dressing. To tackle this challenge, we propose a novel pipeline called Tailor3D, which swiftly creates customized 3D assets from editable dual-side images. We aim to emulate a tailor's ability to locally change objects or perform overall style transfer. Unlike creating 3D assets from multiple views, using dual-side images eliminates conflicts on overlapping areas that occur when editing individual views. Specifically, it begins by editing the front view, then generates the back view of the object through multi-view diffusion. Afterward, it proceeds to edit the back views. Finally, a Dual-sided LRM is proposed to seamlessly stitch together the front and back 3D features, akin to a tailor sewing together the front and back of a garment. The Dual-sided LRM rectifies imperfect consistencies between the front and back views, enhancing editing capabilities and reducing memory burdens while seamlessly integrating them into a unified 3D representation with the LoRA Triplane Transformer. Experimental results demonstrate Tailor3D's effectiveness across various 3D generation and editing tasks, including 3D generative fill and style transfer. It provides a user-friendly, efficient solution for editing 3D assets, with each editing step taking only seconds to complete.

翻译：三维AIGC领域的最新进展已展现出直接从文本和图像创建三维物体的潜力，为动画和产品设计领域带来了显著的成本节约。然而，对三维资产进行精细化编辑与定制化处理仍是长期存在的挑战。具体而言，当前的三维生成方法在遵循精细指令方面，尚无法达到二维图像生成技术同等的精确度。试想通过三维AIGC获得玩具模型时，可能附带不期望的配件与装饰。为应对这一挑战，我们提出名为Tailor3D的创新流程，能够通过可编辑的双面图像快速创建定制化三维资产。我们的目标是模拟裁缝局部修改物体或执行整体风格迁移的能力。与基于多视角图像创建三维资产不同，采用双面图像可消除编辑单视角时在重叠区域产生的冲突。具体流程为：首先编辑正面视图，随后通过多视角扩散生成物体背面视图，继而编辑背面视图。最后，我们提出双面LRM模型，将前后三维特征无缝缝合，其原理类似于裁缝缝合衣物的正反面。该双面LRM能够修正前后视图间的不完美一致性，在通过LoRA Triplane Transformer将其无缝整合为统一三维表征的同时，增强了编辑能力并减轻了内存负担。实验结果表明，Tailor3D在三维生成填充、风格迁移等多种三维生成与编辑任务中均表现优异。该系统为用户提供了友好高效的三维资产编辑方案，每个编辑步骤仅需数秒即可完成。