Many computational chemistry and molecular simulation workflows can be expressed as graphs. This abstraction is useful to modularize and potentially reuse existing components, as well as provide parallelization and ease reproducibility. Existing tools represent the computation as a directed acyclic graph (DAG), thus allowing efficient execution by parallelization of concurrent branches. These systems can, however, generally not express cyclic and conditional workflows. We therefore developed Maize, a workflow manager for cyclic and conditional graphs based on the principles of flow-based programming. By running each node of the graph concurrently in separate processes and allowing communication at any time through dedicated inter-node channels, arbitrary graph structures can be executed. We demonstrate the effectiveness of the tool on a dynamic active learning task in computational drug design, involving the use of a small molecule generative model and an associated scoring system, and on a reactivity prediction pipeline using quantum-chemistry and semiempirical approaches.
翻译:许多计算化学与分子模拟工作流均可表示为图结构。这种抽象有助于模块化及潜在复用现有组件,同时提供并行化能力并提升可复现性。现有工具将计算过程表示为有向无环图(DAG),从而通过并行化并发分支实现高效执行。然而,此类系统通常无法表达循环与条件化工作流。为此,我们基于流编程原理开发了Maize——一个支持循环与条件化图结构的工作流管理器。该工具通过在独立进程中并发运行图的每个节点,并借助专用节点间通道实现随时通信,能够执行任意图结构。我们通过计算药物设计中的动态主动学习任务(涉及小分子生成模型及其关联评分系统的使用)以及结合量子化学与半经验方法的反应性预测流程,验证了该工具的有效性。