Masked graph modeling (MGM) is a promising approach for molecular representation learning (MRL).However, extending the success of re-mask decoding from 2D to 3D MGM is non-trivial, primarily due to two conflicting challenges: avoiding 2D structure leakage to the decoder, while still providing sufficient 2D context for reconstructing re-masked atoms. To address these challenges, we propose 3D-GSRD: a 3D Molecular Graph Auto-Encoder with Selective Re-mask Decoding. The core innovation of 3D-GSRD lies in its Selective Re-mask Decoding(SRD), which re-masks only 3D-relevant information from encoder representations while preserving the 2D graph structures. This SRD is synergistically integrated with a 3D Relational-Transformer(3D-ReTrans) encoder alongside a structure-independent decoder. We analyze that SRD, combined with the structure-independent decoder, enhances the encoder's role in MRL. Extensive experiments show that 3D-GSRD achieves strong downstream performance, setting a new state-of-the-art on 7 out of 8 targets in the widely used MD17 molecular property prediction benchmark. The code is released at https://github.com/WuChang0124/3D-GSRD.
翻译:掩码图建模(MGM)是一种用于分子表示学习(MRL)的有前景的方法。然而,将重掩码解码的成功从二维MGM扩展到三维MGM并非易事,这主要源于两个相互冲突的挑战:既要避免二维结构信息泄露给解码器,又需要为重构重掩码原子提供足够的二维上下文信息。为应对这些挑战,我们提出了3D-GSRD:一种具有选择性重掩码解码的三维分子图自编码器。3D-GSRD的核心创新在于其选择性重掩码解码(SRD),该机制仅从编码器表示中重掩码与三维相关的信息,同时保留二维图结构。此SRD与一个三维关系Transformer(3D-ReTrans)编码器以及一个结构无关的解码器协同集成。我们分析表明,SRD与结构无关解码器相结合,增强了编码器在MRL中的作用。大量实验表明,3D-GSRD在下游任务中取得了强劲的性能,在广泛使用的MD17分子性质预测基准的8个目标中,有7个目标上创造了新的最优性能。代码发布于 https://github.com/WuChang0124/3D-GSRD。