This paper introduces $\infty$-Diff, a generative diffusion model defined in an infinite-dimensional Hilbert space, which can model infinite resolution data. By training on randomly sampled subsets of coordinates and denoising content only at those locations, we learn a continuous function for arbitrary resolution sampling. Unlike prior neural field-based infinite-dimensional models, which use point-wise functions requiring latent compression, our method employs non-local integral operators to map between Hilbert spaces, allowing spatial context aggregation. This is achieved with an efficient multi-scale function-space architecture that operates directly on raw sparse coordinates, coupled with a mollified diffusion process that smooths out irregularities. Through experiments on high-resolution datasets, we found that even at an $8\times$ subsampling rate, our model retains high-quality diffusion. This leads to significant run-time and memory savings, delivers samples with lower FID scores, and scales beyond the training resolution while retaining detail.
翻译:本文提出了$\infty$-Diff,一种定义在无限维希尔伯特空间中的生成扩散模型,能够对无限分辨率数据进行建模。通过在随机采样的坐标子集上训练,并仅对这些位置的噪声内容进行去噪,我们学习了一个可进行任意分辨率采样的连续函数。与以往基于神经场的无限维模型采用需要潜在压缩的点态函数不同,我们的方法使用非局部积分算子实现希尔伯特空间之间的映射,从而允许空间上下文聚合。这一目标通过一个高效的多尺度函数空间架构实现,该架构直接在原始稀疏坐标上运行,并结合了一个平滑不规则性的平滑扩散过程。在高分辨率数据集上的实验表明,即使在$8\times$子采样率下,我们的模型仍能保持高质量的扩散性能。这带来了显著的运行时间和内存节省,生成的样本具有更低的FID分数,并且能够在保持细节的同时扩展到训练分辨率之外。