Sampling from generative models has become a crucial tool for applications like data synthesis and augmentation. Diffusion, Flow Matching and Continuous Normalizing Flows have shown effectiveness across various modalities, and rely on Gaussian latent variables for generation. For search-based or creative applications that require additional control over the generation process, it has become common to manipulate the latent variable directly. However, existing approaches for performing such manipulations (e.g. interpolation or forming low-dimensional representations) only work well in special cases or are network or data-modality specific. We propose Combination of Gaussian variables (COG) as a general purpose interpolation method that is easy to implement yet outperforms recent sophisticated methods. Moreover, COG naturally addresses the broader task of forming general linear combinations of latent variables, allowing the construction of subspaces of the latent space, dramatically simplifying the creation of expressive low-dimensional spaces of high-dimensional objects.
翻译:生成模型采样已成为数据合成与增强等应用的关键工具。扩散模型、流匹配与连续归一化流已在多种模态中展现出有效性,并依赖高斯潜变量进行生成。对于需要额外控制生成过程的搜索型或创造性应用,直接操作潜变量已成为常见做法。然而,现有操作方法(如插值或构建低维表示)仅在特殊情况下表现良好,或局限于特定网络与数据模态。我们提出高斯变量组合(COG)作为一种通用插值方法,该方法易于实现且性能优于近期复杂方法。此外,COG天然适用于构建潜变量一般线性组合的更广泛任务,能够构造潜空间的子空间,从而极大简化高维对象表达性低维空间的创建过程。