Fold-CP: A Context Parallelism Framework for Biomolecular Modeling

Dejun Lin,Simon Chu,Vishanth Iyer,Youhan Lee,John St John,Kevin Boyd,Brian Roland,Xiaowei Ren,Guoqing Zhou,Zhonglin Cao,Polina Binder,Yuliya Zhautouskaya,Jakub Zakrzewski,Maximilian Stadler,Kyle Gion,Yuxing Peng,Xi Chen,Tianjing Zhang,Philipp Junk,Michelle Dimon,Paweł Gniewek,Fabian Ortega,McKinley Polen,Ivan Grubisic,Ali Bashir,Graham Holt,Danny Kovtun,Matthias Grass,Luca Naef,Rui Wang,Jian Peng,Anthony Costa,Saee Paliwal,Eddie Calleja,Timur Rvachov,Neha Tadimeti,Roy Tal,Emine Kucukbenli

from arxiv, 23 pages, 10 figures

Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding models across multiple GPUs. We use the Boltz models as open source reference architectures and implement custom multidimensional primitives that efficiently parallelize both the dense triangular updates and the irregular, data-dependent pattern of window-batched local attention. Our approach achieves efficient memory scaling; for an N-token input distributed across P GPUs, per-device memory scales as $O(N^2/P)$, enabling the structure prediction of assemblies exceeding 30,000 residues on 64 NVIDIA B300 GPUs. We demonstrate the scientific utility of this approach through successful developer use cases: Fold-CP enabled the scoring of over 90% of Comprehensive Resource of Mammalian protein complexes (CORUM) database, as well as folding of disease-relevant PI4KA lipid kinase complex bound to an intrinsically disordered region without cropping. By providing a scalable pathway for modeling massive systems with full global context, Fold-CP represents a significant step toward the realization of a virtual cell.

翻译：理解细胞工作机制需要对大型生物分子组装体进行原子尺度重构。然而，预测此类系统的结构一直受限于如AlphaFold 3等模型的硬件内存需求，这导致在单GPU上可处理的残基数存在数千个的实际上限。本文提出NVIDIA BioNeMo Fold-CP，一种上下文并行框架，通过将共折叠模型的推理与训练流程分布至多个GPU，从而突破此限制。我们采用Boltz模型作为开源参考架构，并实现了自定义多维原语，可高效并行化稠密三角更新与窗口批处理局部注意力中不规则且数据依赖的模式。我们的方法实现了高效的内存扩展；对于一个分布在P个GPU上的N个令牌输入，每设备内存规模为$O(N^2/P)$，从而能够在64个NVIDIA B300 GPU上预测超过30,000个残基的组装体结构。我们通过成功的开发者用例展示了此方法的科学实用性：Fold-CP实现了对哺乳动物蛋白质复合物综合资源库中超过90%的复合物进行评分，并成功折叠了与固有无序区域结合且无需裁剪的疾病相关PI4KA脂质激酶复合物。通过为具有完整全局上下文的大规模系统建模提供可扩展路径，Fold-CP标志着向实现虚拟细胞迈出了重要一步。

相关内容

关注 1

这是第25届年度会议，讨论有约束计算的所有方面，包括理论、算法、环境、语言、模型、系统和应用，如决策、资源分配、调度、配置和规划。为了纪念25周年，吉恩·弗洛伊德创作了一本“虚拟卷”来庆祝这个系列会议。信息可以在这里找到。约束编程协会有本系列中以前的会议列表。CP 2019计划将包括展示关于约束技术的高质量科学论文。除了通常的技术轨道外，CP 2019年会议还将有主题轨道。每个赛道都有一个专门的小组委员会，以确保有能力的评审员将审查这些领域的人提交的论文。官网链接：https://cp2019.a4cp.org/index.html

国产大模型DeepSeek-V3一夜火爆全球，《DeepSeek-V3技术报告》，53页pdf

专知会员服务

23+阅读 · 2024年12月27日

面向超长上下文，大语言模型如何优化架构，这篇综述一网打尽了

专知会员服务

38+阅读 · 2024年1月7日

Meta-Transformer：多模态学习的统一框架

专知会员服务

59+阅读 · 2023年7月21日

PubMed GPT ：用于生物医学文本的特定领域大型语言模型

专知会员服务

38+阅读 · 2022年12月19日