Deep Generative models (DGMs) play two key roles in modern machine learning: (i) producing new information (e.g., image synthesis) and (ii) reducing dimensionality. However, traditional architectures often rely on auxiliary networks such as encoders in Variational Autoencoders (VAEs) or discriminators in Generative Adversarial Networks (GANs), which introduce training instability, computational overhead, and risks like mode collapse. We present NeuroSQL, a new generative paradigm that eliminates the need for auxiliary networks by learning low-dimensional latent representations implicitly. NeuroSQL leverages an asymptotic approximation that expresses the latent variables as the solution to an optimal transportation problem. Specifically, NeuroSQL learns the latent variables by solving a linear assignment problem and then passes the latent information to a standalone generator. We benchmark its performance against GANs, VAEs, and a budget-matched diffusion baseline on four datasets: handwritten digits (MNIST), faces (CelebA), animal faces (AFHQ), and brain images (OASIS). Compared to VAEs, GANs, and diffusion models: (1) in terms of image quality, NeuroSQL achieves overall lower mean pixel distance between synthetic and authentic images and stronger perceptual/structural fidelity; (2) computationally, NeuroSQL requires the least training time; and (3) practically, NeuroSQL provides an effective solution for generating synthetic data with limited training samples. By embracing quantile assignment rather than an encoder, NeuroSQL provides a fast, stable, and robust way to generate synthetic data with minimal information loss.
翻译:深度生成模型在现代机器学习中扮演着两个关键角色:(i) 生成新信息(例如图像合成)和 (ii) 实现降维。然而,传统架构通常依赖于辅助网络,例如变分自编码器中的编码器或生成对抗网络中的判别器,这会引入训练不稳定性、计算开销以及模式崩溃等风险。我们提出了NeuroSQL,一种新的生成范式,它通过隐式学习低维潜在表示,消除了对辅助网络的需求。NeuroSQL利用一种渐近近似,将潜在变量表达为一个最优传输问题的解。具体而言,NeuroSQL通过求解一个线性分配问题来学习潜在变量,然后将潜在信息传递给一个独立的生成器。我们在四个数据集上将其性能与GANs、VAEs以及一个预算匹配的扩散模型基线进行了基准测试:手写数字(MNIST)、人脸(CelebA)、动物面部(AFHQ)和脑部图像(OASIS)。与VAEs、GANs和扩散模型相比:(1) 在图像质量方面,NeuroSQL在合成图像与真实图像之间实现了更低的平均像素距离,并具有更强的感知/结构保真度;(2) 在计算方面,NeuroSQL所需的训练时间最少;(3) 在实际应用中,NeuroSQL为在有限训练样本下生成合成数据提供了有效的解决方案。通过采用分位数分配而非编码器,NeuroSQL提供了一种快速、稳定且鲁棒的方法,能以最小的信息损失生成合成数据。