Unpaired Image Deraining Using Reward-Guided Self-Reinforcement Strategy

Unsupervised deraining has attracted attention for its ability to learn the real-world distribution of rain without paired supervision. However, the lack of strong constraints makes it difficult for the network to converge, especially with the complex diversity of rain degradation. A key motivation is that high-quality deraining results occasionally emerge during training, which can be leveraged to guide the optimization process. To overcome these challenges, we introduce RGSUD (Reward-Guided Self-Reinforcement Unsupervised Image Deraining), comprising two key stages: reward recycling and self-reinforcement (SR) training. For the former stage, we propose an Image Quality Assessment (IQA)-based dynamic reward recycling mechanism that selects optimal derained outputs during training and continuously collects high-quality deraining images. In latter stage, we incorporate these rewards into the model's optimization process, constraining the optimization space and improving alignment between derained outputs and clean images. By leveraging IQA-based self-reinforced loss and dynamically updated rewards, we enhance the quality of synthesized pseudo-paired data and stabilize the optimization. Extensive experiments demonstrate that our method achieves SOTA performance across multiple datasets, including paired synthetic, paired real, and unpaired real images, outperforming existing unsupervised deraining approaches in both subjective and objective IQA metrics. Additionally, we show that the self-reinforcement strategy is adaptable to other unsupervised deraining methods and our deraining framework demonstrates strong generalization across existing supervised deraining networks.

翻译：无监督去雨因能学习真实世界的雨分布而无须配对监督而受到关注。然而，由于缺乏强约束，网络难以收敛，尤其是在雨退化具有复杂多样性的情况下。一个关键动机是：训练过程中偶尔会出现高质量去雨结果，这可用于指导优化过程。为应对这些挑战，我们提出RGSUD（奖励引导的自强化无监督图像去雨），包含两个关键阶段：奖励回收与自强化（SR）训练。前一阶段，我们提出基于图像质量评估（IQA）的动态奖励回收机制，在训练中选择最优去雨输出并持续收集高质量去雨图像。后一阶段，我们将这些奖励融入模型优化过程，约束优化空间并提升去雨输出与干净图像的对齐程度。通过利用基于IQA的自强化损失与动态更新的奖励，我们增强了合成伪配对数据的质量并稳定了优化过程。大量实验表明，我们的方法在多个数据集（包括配对合成、配对真实及无配对真实图像）上均取得SOTA性能，在主观与客观IQA指标上超越现有无监督去雨方法。此外，我们证明自强化策略可适配其他无监督去雨方法，且我们的去雨框架在现有有监督去雨网络上展现出强泛化能力。