Scientific workflows facilitate computational, data manipulation, and sometimes visualization steps for scientific data analysis. They are vital for reproducing and validating experiments, usually involving computational steps in scientific simulations and data analysis. These workflows are often developed by domain scientists using Jupyter notebooks, which are convenient yet face limitations: they struggle to scale with larger data sets, lack failure tolerance, and depend heavily on the stability of underlying tools and packages. To address these issues, Jup2Kup has been developed. This software system translates workflows from Jupyter notebooks into a distributed, high-performance Kubernetes environment, enhancing fault tolerance. It also manages software dependencies to maintain operational stability amidst changes in tools and packages.
翻译:科学工作流支持科学数据分析中的计算、数据处理,有时还包括可视化步骤。它们对于实验的可重复性与验证至关重要,通常涉及科学模拟与数据分析中的计算步骤。这些工作流通常由领域科学家使用Jupyter Notebook开发,这种方式虽便捷,但存在局限性:难以应对大规模数据集、缺乏故障容错能力,且对底层工具和包的稳定性高度依赖。为解决这些问题,研究人员开发了Jup2Kub系统。该软件系统将Jupyter Notebook中的工作流转化为分布式高性能Kubernetes环境,从而增强故障容错能力。同时,它还能管理软件依赖关系,以维持工具与包变更时的运行稳定性。