A Geometric Unification of Generative AI with Manifold-Probabilistic Projection Models

Most models of generative AI for images assume that images are inherently low-dimensional objects embedded within a high-dimensional space. Additionally, it is often implicitly assumed that thematic image datasets form smooth or piecewise smooth manifolds. Common approaches overlook the geometric structure and focus solely on probabilistic methods, approximating the probability distribution through universal approximation techniques such as the kernel method. In some generative models the low dimensional nature of the data manifest itself by the introduction of a lower dimensional latent space. Yet, the probability distribution in the latent or the manifold's coordinate space is considered uninteresting and is predefined or considered uniform. In this study, we address the problem of Blind Image Denoising (BID), and to some extent, the problem of generating images from noise by unifying geometric and probabilistic perspectives. We introduce a novel framework that improves upon existing probabilistic approaches by incorporating geometric assumptions that enable the effective use of kernel-based probabilistic methods. Furthermore, the proposed framework extends prior geometric approaches by combining explicit and implicit manifold descriptions through the introduction of a distance function. The resulting framework demystifies diffusion models by interpreting them as a projection mechanism onto the manifold of ``good images''. This interpretation leads to the construction of a new deterministic model, the Manifold-Probabilistic Projection Model (MPPM), which operates in both the representation (pixel) space and the latent space. We demonstrate that the Latent MPPM (LMPPM) outperforms the Latent Diffusion Model (LDM) across various datasets, achieving superior results in terms of image restoration and generation.

翻译：大多数用于图像的生成式人工智能模型假设图像本质上是嵌入高维空间中的低维对象。此外，通常隐含地假设主题图像数据集形成平滑或分段平滑流形。常见方法忽略了几何结构，仅专注于概率方法，通过核方法等通用逼近技术来近似概率分布。在某些生成模型中，数据的低维特性通过引入低维潜在空间得以体现。然而，潜在空间或流形坐标空间中的概率分布被认为无关紧要，被预定义或视为均匀分布。在本研究中，我们通过统一几何与概率视角，解决了盲图像去噪问题，并在一定程度上解决了从噪声生成图像的问题。我们提出了一种新颖框架，该框架通过引入几何假设改进了现有概率方法，从而能够有效利用基于核的概率方法。此外，所提框架通过引入距离函数结合显式和隐式流形描述，扩展了先前的几何方法。该框架将扩散模型解释为向“优质图像”流形的投影机制，从而揭示了其本质。这一解释促成了新型确定性模型——流形-概率投影模型的构建，该模型可在表示空间和潜在空间中运行。我们证明，潜在MPPM在多个数据集上优于潜在扩散模型，在图像恢复和生成方面取得了更优异的结果。