Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a popular computational method for assessing molecule binding to a protein. Concretely, the goal is to generate drug-like molecules that are scored highly by SMINA, a popular docking software. We observe that popular graph-based generative models fail to generate molecules with a high docking score when trained using a realistically sized training set. This suggests a limitation of the current incarnation of models for de novo drug design. Finally, we propose a simplified version of the benchmark based on a simpler scoring function, and show that the tested models are able to partially solve it. We release the benchmark as an easy to use package available at https://github.com/cieplinski-tobiasz/smina-docking-benchmark. We hope that our benchmark will serve as a stepping stone towards the goal of automatically generating promising drug candidates.
翻译:设计具有所需性质的化合物是药物发现过程中的关键环节。然而,由于缺乏现实的回顾性基准以及前瞻性验证的高昂成本,衡量该领域的进展一直具有挑战性。为弥补这一差距,我们提出一个基于对接的基准——对接是一种评估分子与蛋白质结合的常用计算方法。具体而言,目标是生成由流行对接软件SMINA评分较高的类药分子。我们发现,当使用实际规模的训练集进行训练时,流行的基于图的生成模型无法生成对接分数高的分子。这表明当前用于从头药物设计的模型存在局限性。最后,我们基于更简单的评分函数提出该基准的简化版本,并表明测试的模型能够部分解决该问题。我们将该基准以易于使用的软件包形式发布,地址为 https://github.com/cieplinski-tobiasz/smina-docking-benchmark。我们希望该基准能成为迈向自动生成有前景药物候选分子的垫脚石。