Pretrained neural networks have attracted significant interest in chemistry and small molecule drug design. Embeddings from these models are widely used for molecular property prediction, virtual screening, and small data learning in molecular chemistry. This study presents the most extensive comparison of such models to date, evaluating 25 models across 25 datasets. Under a fair comparison framework, we assess models spanning various modalities, architectures, and pretraining strategies. Using a dedicated hierarchical Bayesian statistical testing model, we arrive at a surprising result: nearly all neural models show negligible or no improvement over the baseline ECFP molecular fingerprint. Only the CLAMP model, which is also based on molecular fingerprints, performs statistically significantly better than the alternatives. These findings raise concerns about the evaluation rigor in existing studies. We discuss potential causes, propose solutions, and offer practical recommendations.
翻译:预训练神经网络在化学与小分子药物设计领域引起了广泛关注。这些模型生成的嵌入表示被广泛应用于分子性质预测、虚拟筛选以及分子化学中的小样本学习。本研究提出了迄今为止对此类模型最全面的比较,在25个数据集上评估了25个模型。在公平比较框架下,我们评估了涵盖多种模态、架构和预训练策略的模型。通过采用专门的分层贝叶斯统计检验模型,我们得出了一个令人惊讶的结果:几乎所有神经模型相较于基线ECFP分子指纹都表现出可忽略或零提升。仅有同样基于分子指纹的CLAMP模型在统计意义上显著优于其他模型。这些发现对现有研究的评估严谨性提出了质疑。我们探讨了潜在原因,提出了解决方案,并给出了实用建议。