The prediction of protein 3D structure from amino acid sequence is a computational grand challenge in biophysics, and plays a key role in robust protein structure prediction algorithms, from drug discovery to genome interpretation. The advent of AI models, such as AlphaFold, is revolutionizing applications that depend on robust protein structure prediction algorithms. To maximize the impact, and ease the usability, of these novel AI tools we introduce APACE, AlphaFold2 and advanced computing as a service, a novel computational framework that effectively handles this AI model and its TB-size database to conduct accelerated protein structure prediction analyses in modern supercomputing environments. We deployed APACE in the Delta supercomputer, and quantified its performance for accurate protein structure predictions using four exemplar proteins: 6AWO, 6OAN, 7MEZ, and 6D6U. Using up to 200 ensembles, distributed across 50 nodes in Delta, equivalent to 200 A100 NVIDIA GPUs, we found that APACE is up to two orders of magnitude faster than off-the-shelf AlphaFold2 implementations, reducing time-to-solution from weeks to minutes. This computational approach may be readily linked with robotics laboratories to automate and accelerate scientific discovery.
翻译:摘要:从氨基酸序列预测蛋白质三维结构是生物物理学中的一项重大计算挑战,在从药物发现到基因组解读等依赖稳健蛋白质结构预测算法的应用中发挥着关键作用。以AlphaFold等AI模型为代表的技术的出现,正在彻底改变依赖可靠蛋白质结构预测算法的各类应用。为最大化这些新型AI工具的影响力并提升其易用性,我们提出了APACE——基于AlphaFold2与先进计算服务的新型计算框架。该框架能高效处理该AI模型及其TB级数据库,在现代超级计算环境中实现加速的蛋白质结构预测分析。我们在Delta超级计算机上部署了APACE,并使用四种模式蛋白(6AWO、6OAN、7MEZ和6D6U)对其蛋白质结构预测精度进行了量化评估。通过使用多达200个集成(分布在Delta的50个节点上,相当于200块A100 NVIDIA GPU),我们发现APACE比现成的AlphaFold2实现快两个数量级,将求解时间从数周缩短至分钟级。该计算方法可便捷地与机器人实验室联动,实现科学发现的自动化与加速。