The prediction of protein 3D structure from amino acid sequence is a computational grand challenge in biophysics, and plays a key role in robust protein structure prediction algorithms, from drug discovery to genome interpretation. The advent of AI models, such as AlphaFold, is revolutionizing applications that depend on robust protein structure prediction algorithms. To maximize the impact, and ease the usability, of these novel AI tools we introduce APACE, AlphaFold2 and advanced computing as a service, a novel computational framework that effectively handles this AI model and its TB-size database to conduct accelerated protein structure prediction analyses in modern supercomputing environments. We deployed APACE in the Delta and Polaris supercomputers, and quantified its performance for accurate protein structure predictions using four exemplar proteins: 6AWO, 6OAN, 7MEZ, and 6D6U. Using up to 300 ensembles, distributed across 200 NVIDIA A100 GPUs, we found that APACE is up to two orders of magnitude faster than off-the-self AlphaFold2 implementations, reducing time-to-solution from weeks to minutes. This computational approach may be readily linked with robotics laboratories to automate and accelerate scientific discovery.
翻译:从氨基酸序列预测蛋白质三维结构是生物物理学领域的重大计算挑战,在从药物发现到基因组解读等依赖稳健蛋白质结构预测算法的应用中具有关键作用。以AlphaFold为代表的人工智能模型正彻底变革相关应用领域。为最大化这些新型AI工具的影响力并降低其使用门槛,我们提出了APACE(AlphaFold2与先进计算即服务)——一种创新的计算框架,能够高效处理该AI模型及其TB级数据库,在现代超级计算环境中实现加速的蛋白质结构预测分析。我们在Delta和Polaris超级计算机上部署了APACE,并选取6AWO、6OAN、7MEZ和6D6U四种典型蛋白质量化其预测精度与性能。通过分布在200个NVIDIA A100 GPU上的300个计算单元,我们发现APACE比开箱即用的AlphaFold2实现快两个数量级,将求解时间从数周缩短至数分钟。该计算框架可与机器人实验室无缝对接,为自动化科学发现提供加速引擎。