With significant advancements in diffusion models, addressing the potential risks of dataset bias becomes increasingly important. Since generated outputs directly suffer from dataset bias, mitigating latent bias becomes a key factor in improving sample quality and proportion. This paper proposes time-dependent importance reweighting to mitigate the bias for the diffusion models. We demonstrate that the time-dependent density ratio becomes more precise than previous approaches, thereby minimizing error propagation in generative learning. While directly applying it to score-matching is intractable, we discover that using the time-dependent density ratio both for reweighting and score correction can lead to a tractable form of the objective function to regenerate the unbiased data density. Furthermore, we theoretically establish a connection with traditional score-matching, and we demonstrate its convergence to an unbiased distribution. The experimental evidence supports the usefulness of the proposed method, which outperforms baselines including time-independent importance reweighting on CIFAR-10, CIFAR-100, FFHQ, and CelebA with various bias settings. Our code is available at https://github.com/alsdudrla10/TIW-DSM.
翻译:随着扩散模型的显著进展,解决数据集偏差的潜在风险变得日益重要。由于生成输出直接受数据集偏差影响,缓解潜在偏差成为提升样本质量与比例的关键因素。本文提出时间依赖重要性重加权方法以减轻扩散模型的偏差。我们证明时间依赖密度比比现有方法更为精确,从而最小化生成学习中的误差传播。尽管直接将其应用于分数匹配难以实现,但我们发现同时将时间依赖密度比用于重加权和分数校正,可得到再生无偏数据密度的目标函数的可解形式。此外,我们从理论上建立了与传统分数匹配的联系,并证明其收敛于无偏分布。实验结果支持所提方法的有效性,其在CIFAR-10、CIFAR-100、FFHQ和CelebA数据集上,针对多种偏差设置均优于包括时间独立重要性重加权在内的基线方法。我们的代码开源于https://github.com/alsdudrla10/TIW-DSM。