Coordinate denoising is a promising 3D molecular pre-training method, which has achieved remarkable performance in various downstream drug discovery tasks. Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks. Nevertheless, there are two challenges for coordinate denoising to learn an effective force field, i.e. low coverage samples and isotropic force field. The underlying reason is that molecular distributions assumed by existing denoising methods fail to capture the anisotropic characteristic of molecules. To tackle these challenges, we propose a novel hybrid noise strategy, including noises on both dihedral angel and coordinate. However, denoising such hybrid noise in a traditional way is no more equivalent to learning the force field. Through theoretical deductions, we find that the problem is caused by the dependency of the input conformation for covariance. To this end, we propose to decouple the two types of noise and design a novel fractional denoising method (Frad), which only denoises the latter coordinate part. In this way, Frad enjoys both the merits of sampling more low-energy structures and the force field equivalence. Extensive experiments show the effectiveness of Frad in molecular representation, with a new state-of-the-art on 9 out of 12 tasks of QM9 and on 7 out of 8 targets of MD17.
翻译:坐标去噪是一种有前景的三维分子预训练方法,在多种下游药物发现任务中取得了显著性能。理论上,该目标等价于学习力场,已被证明有助于下游任务。然而,坐标去噪在学习有效力场时面临两大挑战:低覆盖样本和各项同性的力场。根本原因在于现有去噪方法假设的分子分布无法捕捉分子的各向异性特征。为解决这些问题,我们提出了一种新型混合噪声策略,包含二面角和坐标上的噪声。然而,传统方式对这种混合噪声进行去噪不再等价于学习力场。通过理论推导,我们发现该问题源于协方差对输入构象的依赖性。为此,我们提出解耦两种噪声类型,并设计了一种新颖的分数去噪方法(Frad),仅对后者的坐标部分进行去噪。通过这种方式,Frad兼具采样更多低能结构和力场等价性两方面的优势。大量实验表明,Frad在分子表示中具有有效性,在QM9的12个任务中9个任务以及MD17的8个目标中7个目标上取得了新的最优结果。