Recently, graph neural networks (GNNs) have been successfully applied to predicting molecular properties, which is one of the most classical cheminformatics tasks with various applications. Despite their effectiveness, we empirically observe that training a single GNN model for diverse molecules with distinct structural patterns limits its prediction performance. In this paper, motivated by this observation, we propose TopExpert to leverage topology-specific prediction models (referred to as experts), each of which is responsible for each molecular group sharing similar topological semantics. That is, each expert learns topology-specific discriminative features while being trained with its corresponding topological group. To tackle the key challenge of grouping molecules by their topological patterns, we introduce a clustering-based gating module that assigns an input molecule into one of the clusters and further optimizes the gating module with two different types of self-supervision: topological semantics induced by GNNs and molecular scaffolds, respectively. Extensive experiments demonstrate that TopExpert has boosted the performance for molecular property prediction and also achieved better generalization for new molecules with unseen scaffolds than baselines. The code is available at https://github.com/kimsu55/ToxExpert.
翻译:近年来,图神经网络已被成功应用于预测分子性质,这是最具经典化学信息学任务之一,具有广泛的应用。尽管图神经网络有效,但我们通过实验观察到,针对具有不同结构模式的多样化分子训练单一图神经网络模型会限制其预测性能。受此观察启发,本文提出TopExpert,以利用拓扑特异性预测模型(称为专家),每个专家负责共享相似拓扑语义的分子组。即,每个专家在与其对应的拓扑组训练时,学习拓扑特异性判别特征。为解决按拓扑模式对分子分组的这一关键挑战,我们引入一个基于聚类的门控模块,该模块将输入分子分配至一个聚类,并进一步通过两类自监督(分别由图神经网络诱导的拓扑语义和分子骨架)优化该门控模块。大量实验表明,TopExpert不仅提升了分子性质预测的性能,而且相较于基线方法,对具有未见骨架的新分子实现了更好的泛化。代码已开源在https://github.com/kimsu55/ToxExpert。