Sampling from generative models has become a crucial tool for applications like data synthesis and augmentation. Diffusion, Flow Matching and Continuous Normalizing Flows have shown effectiveness across various modalities, and rely on Gaussian latent variables for generation. For search-based or creative applications that require additional control over the generation process, it has become common to manipulate the latent variable directly. However, existing approaches for performing such manipulations (e.g. interpolation or forming low-dimensional representations) only work well in special cases or are network or data-modality specific. We propose Combination of Gaussian variables (COG) as a general purpose interpolation method that is easy to implement yet outperforms recent sophisticated methods. Moreover, COG naturally addresses the broader task of forming general linear combinations of latent variables, allowing the construction of subspaces of the latent space, dramatically simplifying the creation of expressive low-dimensional spaces of high-dimensional objects.
翻译:从生成模型中采样已成为数据合成与增强等应用的关键工具。扩散模型、流匹配和连续归一化流已在多种模态中展现出有效性,并依赖高斯潜变量进行生成。对于需要额外控制生成过程的搜索型或创造性应用,直接操作潜变量已成为常见做法。然而,现有操作方法(如插值或构建低维表示)仅在特定情况下表现良好,或受限于特定网络或数据模态。我们提出高斯变量组合(COG)作为一种通用插值方法,该方法易于实现且性能优于近期复杂方法。此外,COG自然解决了构建潜变量一般线性组合的更广泛任务,允许构建潜空间的子空间,从而极大简化了高维对象表达性低维空间的创建。