We present a new model for generating molecular data by combining discrete and continuous diffusion processes. Our model generates a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates. The use of diffusion processes allows for capturing the probabilistic nature of molecular processes and the ability to explore the effect of different factors on molecular structures and properties. Additionally, we propose a novel graph transformer architecture to denoise the diffusion process. The transformer is equivariant to Euclidean transformations, allowing it to learn invariant atom and edge representations while preserving the equivariance of atom coordinates. This transformer can be used to learn molecular representations robust to geometric transformations. We evaluate the performance of our model through experiments and comparisons with existing methods, showing its ability to generate more stable and valid molecules with good properties. Our model is a promising approach for designing molecules with desired properties and can be applied to a wide range of tasks in molecular modeling.
翻译:我们提出了一种新模型,通过结合离散与连续扩散过程来生成分子数据。该模型能够生成分子的全面表征,包括原子特征、二维离散分子结构以及三维连续分子坐标。扩散过程的运用使其能够捕捉分子过程的概率本质,并探索不同因素对分子结构与性质的影响。此外,我们提出了一种新颖的图Transformer架构用于对扩散过程进行去噪。该Transformer对欧几里得变换具有等变性,能在保持原子坐标等变性的同时学习不变的原子与边表征。这一架构可用来学习对几何变换具有鲁棒性的分子表征。通过实验及与现有方法的比较,我们评估了模型性能,结果表明其能生成具有良好性质且更稳定有效的分子。该模型为设计具有目标性质的分子提供了有前景的途径,并可应用于分子建模的多种任务。