Sampling from generative models has become a crucial tool for applications like data synthesis and augmentation. Diffusion, Flow Matching and Continuous Normalizing Flows have shown effectiveness across various modalities, and rely on Gaussian latent variables for generation. For search-based or creative applications that require additional control over the generation process, it has become common to manipulate the latent variable directly. However, existing approaches for performing such manipulations (e.g. interpolation or forming low-dimensional representations) only work well in special cases or are network or data-modality specific. We propose Combination of Gaussian variables (COG) as a general purpose method to form linear combinations of latent variables while adhering to the assumptions of the generative model. COG is easy to implement yet outperforms recent sophisticated methods for interpolation. As COG naturally addresses the broader task of forming linear combinations, new capabilities are afforded, including the construction of subspaces of the latent space, dramatically simplifying the creation of expressive low-dimensional spaces of high-dimensional objects.
翻译:生成模型采样已成为数据合成与增强等应用的关键工具。扩散模型、流匹配及连续归一化流已在多种模态中展现其有效性,这些方法均依赖高斯潜变量进行生成。对于需要额外控制生成过程的搜索型或创意型应用,直接操作潜变量已成为常见做法。然而,现有操作方法(如插值或构建低维表示)仅在特定情况下表现良好,或受限于特定网络架构与数据模态。本文提出高斯变量组合法(COG)作为通用方法,可在遵循生成模型假设的前提下构建潜变量的线性组合。COG易于实现,且在插值任务上优于当前复杂方法。由于COG天然适用于更广泛的线性组合任务,该方法赋予了新的能力,包括构建潜空间的子空间,从而极大简化了高维对象表达性低维空间的创建过程。