This paper introduces two key contributions aimed at improving the speed and quality of images generated through inverse diffusion processes. The first contribution involves reparameterizing the diffusion process in terms of the angle on a quarter-circular arc between the image and noise, specifically setting the conventional $\displaystyle \sqrt{\bar{\alpha}}=\cos(\eta)$. This reparameterization eliminates two singularities and allows for the expression of diffusion evolution as a well-behaved ordinary differential equation (ODE). In turn, this allows higher order ODE solvers such as Runge-Kutta methods to be used effectively. The second contribution is to directly estimate both the image ($\mathbf{x}_0$) and noise ($\mathbf{\epsilon}$) using our network, which enables more stable calculations of the update step in the inverse diffusion steps, as accurate estimation of both the image and noise are crucial at different stages of the process. Together with these changes, our model achieves faster generation, with the ability to converge on high-quality images more quickly, and higher quality of the generated images, as measured by metrics such as Frechet Inception Distance (FID), spatial Frechet Inception Distance (sFID), precision, and recall.
翻译:本文提出两项关键贡献,旨在提升逆扩散过程生成图像的速度与质量。第一项贡献是将扩散过程基于图像与噪声在四分之一圆弧上的夹角进行重参数化,具体设定传统$\displaystyle \sqrt{\bar{\alpha}}=\cos(\eta)$。此重参数化消除了两个奇点,使扩散演化可表示为性质良好的常微分方程(ODE),从而有效支持高阶ODE求解器(如龙格-库塔方法)的应用。第二项贡献是直接利用我们的网络同时估计图像($\mathbf{x}_0$)与噪声($\mathbf{\epsilon}$),这能更稳定地计算逆扩散步骤中的更新量——因图像与噪声的精确估计在过程不同阶段均至关重要。结合这些改进,我们的模型实现了更快的生成速度,能更迅速地收敛于高质量图像,同时生成图像的品质(通过弗雷歇初始距离(FID)、空间弗雷歇初始距离(sFID)、精确率与召回率等指标衡量)也得到显著提升。