Diffusion-based 2D virtual try-on (VTON) techniques have recently demonstrated strong performance, while the development of 3D VTON has largely lagged behind. Despite recent advances in text-guided 3D scene editing, integrating 2D VTON into these pipelines to achieve vivid 3D VTON remains challenging. The reasons are twofold. First, text prompts cannot provide sufficient details in describing clothing. Second, 2D VTON results generated from different viewpoints of the same 3D scene lack coherence and spatial relationships, hence frequently leading to appearance inconsistencies and geometric distortions. To resolve these problems, we introduce an image-prompted 3D VTON method (dubbed GS-VTON) which, by leveraging 3D Gaussian Splatting (3DGS) as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. (1) Specifically, we propose a personalized diffusion model that utilizes low-rank adaptation (LoRA) fine-tuning to incorporate personalized information into pre-trained 2D VTON models. To achieve effective LoRA training, we introduce a reference-driven image editing approach that enables the simultaneous editing of multi-view images while ensuring consistency. (2) Furthermore, we propose a persona-aware 3DGS editing framework to facilitate effective editing while maintaining consistent cross-view appearance and high-quality 3D geometry. (3) Additionally, we have established a new 3D VTON benchmark, 3D-VTONBench, which facilitates comprehensive qualitative and quantitative 3D VTON evaluations. Through extensive experiments and comparative analyses with existing methods, the proposed \OM has demonstrated superior fidelity and advanced editing capabilities, affirming its effectiveness for 3D VTON.
翻译:基于扩散模型的二维虚拟试穿技术近期展现出卓越性能,而三维虚拟试穿的发展则相对滞后。尽管文本引导的三维场景编辑技术取得了最新进展,但将二维虚拟试穿集成至此类流程中以实现生动的三维虚拟试穿仍面临挑战。原因有二:首先,文本提示无法提供足够的服装细节描述;其次,从同一三维场景不同视角生成的二维虚拟试穿结果缺乏连贯性与空间关联性,常导致外观不一致与几何形变。为解决这些问题,我们提出一种图像提示的三维虚拟试穿方法(命名为GS-VTON)。该方法以三维高斯泼溅作为三维表征,在将预训练二维虚拟试穿模型知识迁移至三维领域的同时,提升了跨视角一致性。(1)具体而言,我们提出一种个性化扩散模型,通过低秩自适应微调将个性化信息注入预训练的二维虚拟试穿模型。为实现有效的LoRA训练,我们引入参考驱动的图像编辑方法,在确保一致性的同时实现对多视角图像的同步编辑。(2)此外,我们提出感知个性化的三维高斯泼溅编辑框架,在维持跨视角外观一致性与高质量三维几何形态的同时实现高效编辑。(3)另构建了全新的三维虚拟试穿基准测试集3D-VTONBench,为全面定性与定量评估三维虚拟试穿提供支持。通过大量实验及与现有方法的对比分析,所提出的\OM展现出卓越的保真度与先进的编辑能力,证实了其在三维虚拟试穿任务中的有效性。