Scientific workflows are critical to scientific data analysis and often involve computationally intensive processing of large datasets on compute clusters. As such, their execution tends to be long-running and resource-intensive, resulting in significant energy consumption and carbon emissions. While carbon-aware computing methods have received considerable attention in general cloud contexts, their application to scientific data analysis workflows remains a critical research gap. Our study addresses this oversight by showing how the delay tolerance, interruptibility, and scalability of scientific workflows can be leveraged for a significantly more sustainable execution model. In this study, we first quantify the problem of carbon emissions associated with running scientific workflows, and then demonstrate the transformative potential for carbon-aware workflow execution. We estimate the carbon footprint of seven real-world Nextflow workflows executed on diverse dedicated cluster and public cloud resources using high-resolution average and marginal grid carbon intensity data from open and commercial data providers. Furthermore, we conduct a systematic evaluation of the impact of carbon-aware temporal shifting, and the dynamic pausing and resuming of the workflow. Moreover, we investigate the impact of resource scaling at both workflow and workflow task levels. Finally, we report substantial potential reductions in overall carbon emissions, with temporal shifting capable of decreasing emissions by over 80%, and resource scaling by 67%.
翻译:暂无翻译