Gaussian Probability Path based Generative Models (GPPGMs) generate data by reversing a stochastic process that progressively corrupts samples with Gaussian noise. Despite state-of-the-art results in 3D molecular generation, their deployment is hindered by the high cost of long generative trajectories, often requiring hundreds to thousands of steps during training and sampling. In this work, we propose a principled method, named GAGA, to improve generation efficiency without sacrificing training granularity or inference fidelity of GPPGMs. Our key insight is that different data modalities obtain sufficient Gaussianity at markedly different steps during the forward process. Based on this observation, we analytically identify a characteristic step at which molecular data attains sufficient Gaussianity, after which the trajectory can be replaced by a closed-form Gaussian approximation. Unlike existing accelerators that coarsen or reformulate trajectories, our approach preserves full-resolution learning dynamics while avoiding redundant transport through truncated distributional states. Experiments on 3D molecular generation benchmarks demonstrate that our GAGA achieves substantial improvement on both generation quality and computational efficiency.
翻译:基于高斯概率路径的生成模型(GPPGMs)通过逆转一个用高斯噪声逐步破坏样本的随机过程来生成数据。尽管在三维分子生成方面取得了最先进的成果,但其部署受到长生成轨迹高计算成本的阻碍,通常需要在训练和采样时执行数百至数千步。在本工作中,我们提出了一种名为GAGA的原理性方法,旨在不牺牲GPPGMs的训练粒度或推理保真度的前提下提升生成效率。我们的核心洞见是:不同数据模态在前向过程中达到充分高斯性所需的步骤数存在显著差异。基于这一观察,我们通过解析方法确定了一个特征步骤,在该步骤后分子数据已获得充分高斯性,此后轨迹可被一个闭式高斯近似所替代。与现有通过粗化或重构轨迹的加速方法不同,我们的方法保留了全分辨率的学习动态,同时避免了在截断分布状态中进行冗余传输。在三维分子生成基准测试上的实验表明,我们的GAGA在生成质量和计算效率两方面均实现了显著提升。