A survey of probabilistic generative frameworks for molecular simulations

Generative artificial intelligence is now a widely used tool in molecular science. Despite the popularity of probabilistic generative models, numerical experiments benchmarking their performance on molecular data are lacking. In this work, we introduce and explain several classes of generative models, broadly sorted into two categories: flow-based models and diffusion models. We select three representative models: Neural Spline Flows, Conditional Flow Matching, and Denoising Diffusion Probabilistic Models, and examine their accuracy, computational cost, and generation speed across datasets with tunable dimensionality, complexity, and modal asymmetry. Our findings are varied, with no one framework being the best for all purposes. In a nutshell, (i) Neural Spline Flows do best at capturing mode asymmetry present in low-dimensional data, (ii) Conditional Flow Matching outperforms other models for high-dimensional data with low complexity, and (iii) Denoising Diffusion Probabilistic Models appears the best for low-dimensional data with high complexity. Our datasets include a Gaussian mixture model and the dihedral torsion angle distribution of the Aib\textsubscript{9} peptide, generated via a molecular dynamics simulation. We hope our taxonomy of probabilistic generative frameworks and numerical results may guide model selection for a wide range of molecular tasks.

翻译：生成式人工智能现已成为分子科学中广泛使用的工具。尽管概率生成模型广受欢迎，但针对其在分子数据上性能的基准数值实验仍较为缺乏。本文介绍并解释了几类生成模型，大致分为两类：基于流的模型和扩散模型。我们选取了三种代表性模型：神经样条流、条件流匹配和去噪扩散概率模型，并在具有可调维度、复杂性和模态不对称性的数据集上考察了它们的准确性、计算成本和生成速度。我们的研究结果多样，没有一种框架在所有用途上都是最优的。简而言之：(i) 神经样条流在捕捉低维数据中存在的模态不对称性方面表现最佳；(ii) 对于低复杂度的高维数据，条件流匹配优于其他模型；(iii) 对于高复杂度的低维数据，去噪扩散概率模型似乎是最佳选择。我们的数据集包括一个高斯混合模型以及通过分子动力学模拟生成的Aib\textsubscript{9}肽的二面角扭转角分布。我们希望我们对概率生成框架的分类和数值结果能够为广泛的分子任务中的模型选择提供指导。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

牛津大学最新《计算代数拓扑》笔记书，107页pdf

专知会员服务

44+阅读 · 2022年2月17日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日