Domain Adaptation based on Human Feedback for Enhancing Generative Model Denoising Abilities

How can we apply human feedback into generative model? As answer of this question, in this paper, we show the method applied on denoising problem and domain adaptation using human feedback. Deep generative models have demonstrated impressive results in image denoising. However, current image denoising models often produce inappropriate results when applied to domains different from the ones they were trained on. If there are `Good' and `Bad' result for unseen data, how to raise up quality of `Bad' result. Most methods use an approach based on generalization of model. However, these methods require target image for training or adapting unseen domain. In this paper, to adapting domain, we deal with non-target image for unseen domain, and improve specific failed image. To address this, we propose a method for fine-tuning inappropriate results generated in a different domain by utilizing human feedback. First, we train a generator to denoise images using only the noisy MNIST digit '0' images. The denoising generator trained on the source domain leads to unintended results when applied to target domain images. To achieve domain adaptation, we construct a noise-image denoising generated image data set and train a reward model predict human feedback. Finally, we fine-tune the generator on the different domain using the reward model with auxiliary loss function, aiming to transfer denoising capabilities to target domain. Our approach demonstrates the potential to efficiently fine-tune a generator trained on one domain using human feedback from another domain, thereby enhancing denoising abilities in different domains.

翻译：如何将人类反馈应用于生成模型？针对这一问题，本文展示了将人类反馈应用于去噪问题和域自适应的方法。深度生成模型在图像去噪中已展现出显著效果。然而，当前图像去噪模型在应用于与训练数据不同的域时，常会产生不理想的结果。若对未见数据存在‘好’与‘差’两类结果，如何提升‘差’结果的质量？大多数方法采用基于模型泛化的策略，但这些方法需要目标图像进行训练或适应未见域。本文为适应新域，处理未见域的非目标图像，并改进特定失败图像。为此，我们提出一种利用人类反馈对跨域生成的不适当结果进行微调的方法。首先，仅使用含噪MNIST数字‘0’图像训练一个去噪生成器。该生成器在源域训练后，应用于目标域图像时会产生非预期结果。为实现域自适应，我们构建噪声-图像-去噪生成图像数据集，并训练一个预测人类反馈的奖励模型。最后，结合辅助损失函数与奖励模型，在目标域微调生成器，旨在将去噪能力迁移至目标域。本方法展示了利用另一域的人类反馈对单域训练生成器进行高效微调的潜力，从而增强不同域间的去噪能力。