In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representations, enabling a variety of applications beyond property prediction, which otherwise rely on expert knowledge. The required representation can either be provided by a pretrained MPNN, or learned from scratch using only structural information. Beyond the data-driven design of molecular fingerprints, the versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases, the automatic construction of coarse-grained force fields, as well as the identification of reaction coordinates.
翻译:近年来,基于机器学习方法预测量子力学可观测量日益受到关注。消息传递神经网络通过构建原子表示来预测目标性质,从而解决这一任务。本文提出一种能从这些表示中自动识别化学官能团的方法,使除性质预测外的多种应用成为可能——而此类应用通常依赖于专家知识。所需表示既可由预训练的MPNN提供,也可仅利用结构信息从零开始学习。除数据驱动的分子指纹设计外,本研究通过实现化学数据库代表性条目筛选、粗粒化力场自动构建以及反应坐标识别等应用,验证了该方法的通用性。