Activity cliffs (ACs), which are generally defined as pairs of structurally similar molecules that are active against the same bio-target but significantly different in the binding potency, are of great importance to drug discovery. Up to date, the AC prediction problem, i.e., to predict whether a pair of molecules exhibit the AC relationship, has not yet been fully explored. In this paper, we first introduce ACNet, a large-scale dataset for AC prediction. ACNet curates over 400K Matched Molecular Pairs (MMPs) against 190 targets, including over 20K MMP-cliffs and 380K non-AC MMPs, and provides five subsets for model development and evaluation. Then, we propose a baseline framework to benchmark the predictive performance of molecular representations encoded by deep neural networks for AC prediction, and 16 models are evaluated in experiments. Our experimental results show that deep learning models can achieve good performance when the models are trained on tasks with adequate amount of data, while the imbalanced, low-data and out-of-distribution features of the ACNet dataset still make it challenging for deep neural networks to cope with. In addition, the traditional ECFP method shows a natural advantage on MMP-cliff prediction, and outperforms other deep learning models on most of the data subsets. To the best of our knowledge, our work constructs the first large-scale dataset for AC prediction, which may stimulate the study of AC prediction models and prompt further breakthroughs in AI-aided drug discovery. The codes and dataset can be accessed by https://drugai.github.io/ACNet/.
翻译:活性悬崖(AC)通常定义为结构相似但作用于同一生物靶标时结合效力存在显著差异的分子对,这对药物发现具有重要意义。迄今为止,AC预测问题(即预测一对分子是否具有AC关系)尚未得到充分探索。本文首先引入ACNet,一个用于AC预测的大规模数据集。ACNet整理了针对190个靶标的超过40万对匹配分子对(MMPs),包括2万余对MMP-cliffs和38万对非AC MMPs,并提供五个子集用于模型开发与评估。随后,我们提出一个基准框架,用于评估深度神经网络编码的分子表征在AC预测中的性能,并在实验中评估了16个模型。实验结果表明,当模型在数据充足的训练任务上训练时,深度学习模型能取得良好性能;但ACNet数据集的非平衡、低数据量及分布外特征仍使深度神经网络难以应对。此外,传统ECFP方法在MMP-cliff预测方面展现出天然优势,并在多数数据子集上优于其他深度学习模型。据我们所知,本研究构建了首个用于AC预测的大规模数据集,有望推动AC预测模型的研究,并促进AI辅助药物发现的进一步突破。代码和数据集可通过 https://drugai.github.io/ACNet/ 获取。