Neural processes (NPs) are models for transfer learning with properties reminiscent of Gaussian Processes (GPs). They are adept at modelling data consisting of few observations of many related functions on the same input space and are trained by minimizing a variational objective, which is computationally much less expensive than the Bayesian updating required by GPs. So far, most studies of NPs have focused on low-dimensional datasets which are not representative of realistic transfer learning tasks. Drug discovery is one application area that is characterized by datasets consisting of many chemical properties or functions which are sparsely observed, yet depend on shared features or representations of the molecular inputs. This paper applies the conditional neural process (CNP) to DOCKSTRING, a dataset of docking scores for benchmarking ML models. CNPs show competitive performance in few-shot learning tasks relative to supervised learning baselines common in chemoinformatics, as well as an alternative model for transfer learning based on pre-training and refining neural network regressors. We present a Bayesian optimization experiment which showcases the probabilistic nature of CNPs and discuss shortcomings of the model in uncertainty quantification.
翻译:神经过程(NPs)是用于迁移学习的模型,具有类似高斯过程(GPs)的性质。它们擅长建模由同一输入空间上多个相关函数少量观测值组成的数据,并通过最小化变分目标进行训练,其计算成本远低于GPs所需的贝叶斯更新。目前,大多数关于NPs的研究集中在低维数据集上,这些数据集无法代表真实的迁移学习任务。药物发现是一个应用领域,其特点是数据集包含许多稀疏观测的化学性质或函数,但这些性质依赖于分子输入的共享特征或表示。本文将条件神经过程(CNP)应用于DOCKSTRING(一个用于机器学习模型基准测试的对接得分数据集)。与化学信息学中常见的监督学习基线以及基于预训练和微调神经网络回归器的替代迁移学习模型相比,CNP在小样本学习任务中表现出竞争性性能。我们展示了一个贝叶斯优化实验,凸显了CNP的概率特性,并讨论了该模型在不确定性量化方面的不足。