We introduce a diffusion-transformer (DiT) framework for single-image reflection removal that leverages the generalization strengths of foundation diffusion models in the restoration setting. Rather than relying on task-specific architectures, we repurpose a pre-trained DiT-based foundation model by conditioning it on reflection-contaminated inputs and guiding it toward clean transmission layers. We systematically analyze existing reflection removal data sources for diversity, scalability, and photorealism. To address the shortage of suitable data, we construct a physically based rendering (PBR) pipeline in Blender, built around the Principled BSDF, to synthesize realistic glass materials and reflection effects. Efficient LoRA-based adaptation of the foundation model, combined with the proposed synthetic data, achieves state-of-the-art performance on in-domain and zero-shot benchmarks. These results demonstrate that pretrained diffusion transformers, when paired with physically grounded data synthesis and efficient adaptation, offer a scalable and high-fidelity solution for reflection removal. Project page: https://hf.co/spaces/huawei-bayerlab/windowseat-reflection-removal-web
翻译:本文提出了一种用于单幅图像反射消除的扩散变换器(DiT)框架,该框架利用基础扩散模型在图像复原任务中的泛化优势。与依赖特定任务架构的传统方法不同,我们通过将预训练的DiT基础模型以反射污染图像为条件,并引导其生成干净的透射层,实现了模型的重用。我们系统分析了现有反射消除数据源的多样性、可扩展性和照片真实感。针对合适数据不足的问题,我们在Blender中构建了一个基于物理渲染(PBR)的流程,围绕Principled BSDF材质合成真实的玻璃材质与反射效果。通过基于LoRA的基础模型高效适配,结合所提出的合成数据,本方法在领域内和零样本基准测试中均达到了最先进的性能。这些结果表明,预训练的扩散变换器与基于物理的数据合成及高效适配相结合,为反射消除提供了一个可扩展且高保真的解决方案。项目页面:https://hf.co/spaces/huawei-bayerlab/windowseat-reflection-removal-web