Gaussian Probability Path based Generative Models (GPPGMs) generate data by reversing a stochastic process that progressively corrupts samples with Gaussian noise. Despite state-of-the-art results in 3D molecular generation, their deployment is hindered by the high cost of long generative trajectories, often requiring hundreds to thousands of steps during training and sampling. In this work, we propose a principled method, named GAGA, to improve generation efficiency without sacrificing training granularity or inference fidelity of GPPGMs. Our key insight is that different data modalities obtain sufficient Gaussianity at markedly different steps during the forward process. Based on this observation, we analytically identify a characteristic step at which molecular data attains sufficient Gaussianity, after which the trajectory can be replaced by a closed-form Gaussian approximation. Unlike existing accelerators that coarsen or reformulate trajectories, our approach preserves full-resolution learning dynamics while avoiding redundant transport through truncated distributional states. Experiments on 3D molecular generation benchmarks demonstrate that our GAGA achieves substantial improvement on both generation quality and computational efficiency.
翻译:基于高斯概率路径的生成模型(GPPGMs)通过逆转一个逐渐用高斯噪声破坏样本的随机过程来生成数据。尽管在三维分子生成中取得了最先进的结果,但其应用受到长生成轨迹高计算成本的阻碍,在训练和采样阶段通常需要数百至数千步。本文提出一种名为GAGA的原理性方法,旨在不牺牲GPPGMs的训练粒度或推理保真度的前提下提升生成效率。我们的核心洞见是:不同数据模态在前向过程中达到充分高斯性的步骤存在显著差异。基于此观察,我们通过解析方法确定了分子数据达到充分高斯性的特征步骤,此后轨迹可被闭式高斯近似替代。与现有通过粗化或重构轨迹的加速方法不同,我们的方法在保持全分辨率学习动态的同时,避免了在截断分布状态中进行冗余传输。在三维分子生成基准上的实验表明,GAGA在生成质量与计算效率方面均实现了显著提升。