Expressivity and Generalization: Fragment-Biases for Molecular GNNs

Although recent advances in higher-order Graph Neural Networks (GNNs) improve the theoretical expressiveness and molecular property predictive performance, they often fall short of the empirical performance of models that explicitly use fragment information as inductive bias. However, for these approaches, there exists no theoretic expressivity study. In this work, we propose the Fragment-WL test, an extension to the well-known Weisfeiler & Leman (WL) test, which enables the theoretic analysis of these fragment-biased GNNs. Building on the insights gained from the Fragment-WL test, we develop a new GNN architecture and a fragmentation with infinite vocabulary that significantly boosts expressiveness. We show the effectiveness of our model on synthetic and real-world data where we outperform all GNNs on Peptides and have 12% lower error than all GNNs on ZINC and 34% lower error than other fragment-biased models. Furthermore, we show that our model exhibits superior generalization capabilities compared to the latest transformer-based architectures, positioning it as a robust solution for a range of molecular modeling tasks.

翻译：尽管高阶图神经网络（GNNs）的最新进展提升了理论表达能力和分子性质预测性能，但其经验性能往往不及那些显式利用片段信息作为归纳偏置的模型。然而，对于这些利用片段信息的方法，目前尚缺乏理论表达力研究。本文提出片段-威斯费勒-莱曼测试，作为对经典威斯费勒-莱曼（WL）测试的扩展，从而实现对这类片段偏置GNNs的理论分析。基于片段-WL测试的洞见，我们开发了一种新型GNN架构和具有无限词汇表的片段化方法，显著提升了表达能力。我们在合成数据集和真实数据集上验证了模型的有效性：在Peptides数据集上超越所有GNNs，在ZINC数据集上误差比所有GNNs降低12%，且比其他片段偏置模型误差降低34%。此外，我们证明该模型相较于最新的基于Transformer的架构展现出更优的泛化能力，为各类分子建模任务提供了稳健的解决方案。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日