The essence of score-based generative models (SGM) is to optimize a score-based model towards the score function. However, we show that noisy samples incur another objective function, rather than the one with score function, which will wrongly optimize the model. To address this problem, we first consider a new setting where every noisy sample is paired with a risk vector, indicating the data quality (e.g., noise level). This setting is very common in real-world applications, especially for medical and sensor data. Then, we introduce risk-sensitive SDE, a type of stochastic differential equation (SDE) parameterized by the risk vector. With this tool, we aim to minimize a measure called perturbation instability, which we define to quantify the negative impact of noisy samples on optimization. We will prove that zero instability measure is only achievable in the case where noisy samples are caused by Gaussian perturbation. For non-Gaussian cases, we will also provide its optimal coefficients that minimize the misguidance of noisy samples. To apply risk-sensitive SDE in practice, we extend widely used diffusion models to their risk-sensitive versions and derive a risk-free loss that is efficient for computation. We also have conducted numerical experiments to confirm the validity of our theorems and show that they let SGM be robust to noisy samples for optimization.
翻译:基于分数生成模型(SGM)的核心在于通过优化分数模型逼近真实分数函数。然而,我们证明含噪样本会引入与分数函数不同的目标函数,导致模型优化方向出现偏差。针对此问题,我们首先提出一种新设定:每个含噪样本均关联一个风险向量,用于表征数据质量(如噪声水平)。该设定在医疗数据和传感器数据等实际场景中广泛存在。在此基础上,我们引入风险敏感随机微分方程(SDE)——一类以风险向量为参数的随机微分方程。利用该工具,我们旨在最小化称为扰动不稳定性的度量,该度量专门量化含噪样本对优化过程的负面影响。理论上证明:仅当含噪样本由高斯扰动生成时,不稳定性度量方可归零。针对非高斯情形,我们进一步给出使含噪样本误导效应最小化的最优系数。为将风险敏感SDE应用于实践,我们将现有主流扩散模型扩展为风险敏感版本,并推导出计算高效的无风险损失函数。数值实验验证了理论正确性,表明所提方法可增强SGM对含噪样本优化的鲁棒性。