Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, need for processing large scale datasets produced by instruments at the edge, intensification of near real-time data processing, support for long-term experiment campaigns, and emergence of quantum computing as an adjunct to HPC, have significantly changed the functional and operational requirements of workflow systems. Workflow systems now need to, for example, support data streams from the edge-to-cloud-to-HPC enable the management of many small-sized files, allow data reduction while ensuring high accuracy, orchestrate distributed services (workflows, instruments, data movement, provenance, publication, etc.) across computing and user facilities, among others. Further, to accelerate science, it is also necessary that these systems implement specifications/standards and APIs for seamless (horizontal and vertical) integration between systems and applications, as well as enabling the publication of workflows and their associated products according to the FAIR principles. This document reports on discussions and findings from the 2022 international edition of the Workflows Community Summit that took place on November 29 and 30, 2022.
翻译:科学工作流已成为广泛科学计算用例中的关键工具。科学发现日益依赖工作流来编排大规模和复杂的科学实验,这些实验涵盖从基于云的数据预处理管线执行到跨设施仪器-边缘-高性能计算(HPC)的计算工作流。鉴于科学计算格局的变化以及新兴科学应用需求的演进,开发新型科学工作流及系统功能以提升现有系统和应用的效率、弹性和普及性至关重要。具体而言,机器学习/人工智能(ML/AI)工作流的激增、对边缘仪器产生的大规模数据集处理的需求、近实时数据处理强度的增加、长期实验活动的支持,以及量子计算作为HPC补充技术的兴起,已显著改变了工作流系统的功能与运行需求。工作流系统现在需支持如从边缘到云再到HPC的数据流、管理大量小文件、在确保高精度的同时实现数据压缩、跨计算与用户设施编排分布式服务(工作流、仪器、数据移动、溯源、发布等)等功能。此外,为加速科学发现,这些系统还需实施规范/标准及应用程序编程接口(API),实现系统与应用程序之间的无缝(横向与纵向)集成,并依据FAIR原则实现工作流及其关联产品的发布。本文档报告了2022年11月29日至30日举行的2022年国际版工作流社区峰会的讨论与发现。