Learning Topology-Specific Experts for Molecular Property Prediction

Recently, graph neural networks (GNNs) have been successfully applied to predicting molecular properties, which is one of the most classical cheminformatics tasks with various applications. Despite their effectiveness, we empirically observe that training a single GNN model for diverse molecules with distinct structural patterns limits its prediction performance. In this paper, motivated by this observation, we propose \proposed to leverage topology-specific prediction models (referred to as experts), each of which is responsible for each molecular group sharing similar topological semantics. That is, each expert learns topology-specific discriminative features while being trained with its corresponding topological group. To tackle the key challenge of grouping molecules by their topological patterns, we introduce a clustering-based gating module that assigns an input molecule into one of the clusters and further optimizes the gating module with two different types of self-supervision: topological semantics induced by GNNs and molecular scaffolds, respectively. Extensive experiments demonstrate that \proposed has boosted the performance for molecular property prediction and also achieved better generalization for new molecules with unseen scaffolds than baselines. The code is available at https://github.com/kimsu55/ToxExpert.

翻译：近期，图神经网络（GNN）已成功应用于分子性质预测，这是最经典的化学信息学任务之一，具有广泛的应用场景。尽管其有效性，我们通过实验观察到，针对具有不同结构模式的多样化分子训练单一GNN模型会限制其预测性能。受此观察启发，本文提出利用拓扑特异性预测模型（称为专家），每个模型负责处理共享相似拓扑语义的分子组。即，每位专家学习拓扑特异性判别特征，并与其对应的拓扑组进行协同训练。为解决按拓扑模式对分子分组的核心挑战，我们引入一个基于聚类的门控模块，将输入分子分配至特定聚类，并通过两种不同类型的自监督（即GNN诱导的拓扑语义与分子骨架）进一步优化该门控模块。大量实验表明，所提方法在分子性质预测任务上显著提升了性能，并在面对含有未知骨架的新分子时，表现出优于基线模型的泛化能力。相关代码已开源至：https://github.com/kimsu55/ToxExpert。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日