In this methods article, we provide a flexible but easy-to-use implementation of Direct Coupling Analysis (DCA) based on Boltzmann machine learning, together with a tutorial on how to use it. The package \texttt{adabmDCA 2.0} is available in different programming languages (C++, Julia, Python) usable on different architectures (single-core and multi-core CPU, GPU) using a common front-end interface. In addition to several learning protocols for dense and sparse generative DCA models, it allows to directly address common downstream tasks like residue-residue contact prediction, mutational-effect prediction, scoring of sequence libraries and generation of artificial sequences for sequence design. It is readily applicable to protein and RNA sequence data.
翻译:在这篇方法学文章中,我们提供了一个基于玻尔兹曼机器学习的、灵活且易于使用的直接耦合分析(DCA)实现,并附有使用教程。该软件包 \texttt{adabmDCA 2.0} 提供多种编程语言版本(C++、Julia、Python),可通过一个统一的前端接口在不同架构(单核与多核CPU、GPU)上使用。除了为稠密和稀疏生成式DCA模型提供多种学习协议外,它还能直接处理常见的下游任务,如残基-残基接触预测、突变效应预测、序列库评分以及用于序列设计的人工序列生成。该软件包可直接应用于蛋白质和RNA序列数据。