Computational protein design is experiencing a transformation driven by AI/ML. However, the range of potential protein sequences and structures is astronomically vast, even for moderately sized proteins. Hence, achieving convergence between generated and predicted structures demands substantial computational resources for sampling. The Integrated Machine-learning for Protein Structures at Scale (IMPRESS) offers methods and advanced computing systems for coupling AI to high-performance computing tasks, enabling the ability to evaluate the effectiveness of protein designs as they are developed, as well as the models and simulations used to generate data and train models. This paper introduces IMPRESS and demonstrates the development and implementation of an adaptive protein design protocol and its supporting computing infrastructure. This leads to increased consistency in the quality of protein design and enhanced throughput of protein design due to dynamic resource allocation and asynchronous workload execution.
翻译:计算蛋白质设计正经历由人工智能/机器学习驱动的变革。然而,即使是中等大小的蛋白质,其潜在的序列与结构空间也极为庞大。因此,要实现生成结构与预测结构之间的收敛,需要大量的计算资源进行采样。大规模集成机器学习蛋白质结构平台(IMPRESS)提供了将人工智能与高性能计算任务耦合的方法与先进计算系统,使其能够在蛋白质设计开发过程中评估设计有效性,以及用于生成数据和训练模型的模拟与模型。本文介绍了IMPRESS平台,并展示了一种自适应蛋白质设计协议及其支撑计算基础设施的开发与实施。通过动态资源分配与异步工作负载执行,该方案提高了蛋白质设计质量的一致性,并增强了蛋白质设计的通量。