Disentangling visual layers in real-world images is a persistent challenge in vision and graphics, as such layers often involve non-linear and globally coupled interactions, including shading, reflection, and perspective distortion. In this work, we present an in-context image decomposition framework that leverages large diffusion foundation models for layered separation. We focus on the challenging case of logo-object decomposition, where the goal is to disentangle a logo from the surface on which it appears while faithfully preserving both layers. Our method fine-tunes a pretrained diffusion model via lightweight LoRA adaptation and introduces a cycle-consistent tuning strategy that jointly trains decomposition and composition models, enforcing reconstruction consistency between decomposed and recomposed images. This bidirectional supervision substantially enhances robustness in cases where the layers exhibit complex interactions. Furthermore, we introduce a progressive self-improving process, which iteratively augments the training set with high-quality model-generated examples to refine performance. Extensive experiments demonstrate that our approach achieves accurate and coherent decompositions and also generalizes effectively across other decomposition types, suggesting its potential as a unified framework for layered image decomposition.
翻译:真实世界图像中的视觉层分离是视觉与图形学领域长期存在的挑战,此类层通常涉及非线性且全局耦合的交互作用,包括阴影、反射和透视畸变。本研究提出一种上下文图像分解框架,利用大规模扩散基础模型实现分层分离。我们聚焦于标志-物体分解这一具有挑战性的场景,其目标是在保持两层信息完整性的前提下,将标志从其附着表面中分离出来。该方法通过轻量级LoRA适配对预训练扩散模型进行微调,并引入循环一致调优策略,联合训练分解与合成模型,从而强制实现分解图像与重组图像间的重建一致性。这种双向监督机制显著提升了层间存在复杂交互情况下的鲁棒性。此外,我们提出渐进式自我改进流程,通过迭代式添加高质量模型生成样本来扩充训练集,从而优化模型性能。大量实验表明,我们的方法不仅能实现精确连贯的分解效果,还能有效泛化至其他分解类型,这预示着其有望成为分层图像分解的统一框架。