Ongoing advances in force field and computer hardware development enable the use of molecular dynamics (MD) to simulate increasingly complex systems with the ultimate goal of reaching cellular complexity. At the same time, rational design by high-throughput (HT) simulations is another forefront of MD. In these areas, the Martini coarse-grained force field, especially the latest version (i.e. v3), is being actively explored because it offers enhanced spatial-temporal resolution. However, the automation tools for preparing simulations with the Martini force field, accompanying the previous version, were not designed for HT simulations or studies of complex cellular systems. Therefore, they become a major limiting factor. To address these shortcomings, we present the open-source Vermouth python library. Vermouth is designed to become the unified framework for developing programs, which prepare, run, and analyze Martini simulations of complex systems. To demonstrate the power of the Vermouth library, the Martinize2 program is showcased as a generalization of the martinize script, originally aimed to set up simulations of proteins. In contrast to the previous version, Martinize2 automatically handles protonation states in proteins and post-translation modifications, offers more options to fine-tune structural biases such as the elastic network, and can convert non-protein molecules such as ligands. Finally, Martinize2 is used in two high-complexity benchmarks. The entire I-TASSER protein template database as well as a subset of 200,000 structures from the AlphaFold Protein Structure Database are converted to CG resolution and we illustrate how the checks on input structure quality can safeguard high-throughput applications.
翻译:力场与计算机硬件的持续进步,使得利用分子动力学模拟日趋复杂的系统成为可能,最终目标是达到细胞级别的复杂性。与此同时,基于高通量模拟的理性设计是分子动力学的另一前沿领域。在这些应用中,Martini粗粒化力场,尤其是最新版本(v3版),因其更高的时空分辨率而备受关注。然而,针对前版Martini力场开发的模拟准备自动化工具,并未考虑高通量模拟或复杂细胞系统的研究需求,因此成为主要限制因素。为解决上述问题,我们提出开源Vermouth Python库。Vermouth旨在构建统一框架,用于开发针对复杂系统Martini模拟的制备、运行及分析程序。为展示Vermouth库的功能,我们以Martinize2程序为例,该程序是原用于蛋白质模拟设置的martinize脚本的泛化版本。与前版不同,Martinize2可自动处理蛋白质的质子化状态与翻译后修饰,提供更多选项以精细调控弹性网络等结构偏差,并能转换配体等非蛋白质分子。最后,我们通过两个高复杂度基准测试验证Martinize2:将整个I-TASSER蛋白质模板数据库以及来自AlphaFold蛋白质结构数据库的20万个子集结构转换为粗粒化分辨率,并阐明输入结构质量检查如何保障高通量应用的可靠性。