Contemporary graph learning algorithms are not well-defined for large molecules since they do not consider the hierarchical interactions among the atoms, which are essential to determine the molecular properties of macromolecules. In this work, we propose Multiresolution Graph Transformers (MGT), the first graph transformer architecture that can learn to represent large molecules at multiple scales. MGT can learn to produce representations for the atoms and group them into meaningful functional groups or repeating units. We also introduce Wavelet Positional Encoding (WavePE), a new positional encoding method that can guarantee localization in both spectral and spatial domains. Our proposed model achieves competitive results on two macromolecule datasets consisting of polymers and peptides, and one drug-like molecule dataset. Importantly, our model outperforms other state-of-the-art methods and achieves chemical accuracy in estimating molecular properties (e.g., GAP, HOMO and LUMO) calculated by Density Functional Theory (DFT) in the polymers dataset. Furthermore, the visualizations, including clustering results on macromolecules and low-dimensional spaces of their representations, demonstrate the capability of our methodology in learning to represent long-range and hierarchical structures. Our PyTorch implementation is publicly available at https://github.com/HySonLab/Multires-Graph-Transformer
翻译:当代图学习算法对于大分子而言并不完善,因为它们未考虑原子间决定大分子分子性质的层次相互作用。在这项工作中,我们提出了多分辨率图Transformer(MGT),这是首个能够学习在多个尺度上表示大分子的图Transformer架构。MGT能够学习生成原子的表示,并将其分组为有意义的官能团或重复单元。我们还引入了小波位置编码(WavePE),这是一种新的位置编码方法,可确保在谱域和空间域中均实现局部化。我们提出的模型在两个由聚合物和肽构成的大分子数据集以及一个药物样分子数据集上取得了具有竞争力的结果。重要的是,我们的模型优于其他最先进的方法,并在聚合物数据集上达到了由密度泛函理论(DFT)计算的分子性质(如GAP、HOMO和LUMO)估算的化学精度。此外,包括大分子聚类结果及其表示的低维空间在内的可视化展示,证明了我们的方法在学习表示长程和层次结构方面的能力。我们的PyTorch实现已在https://github.com/HySonLab/Multires-Graph-Transformer 上公开提供。