With the escalating complexity and volume of genomic data, the capacity of biology institutions' HPC faces limitations. While the Cloud presents a viable solution for short-term elasticity, its intricacies pose challenges for bioinformatics users. Alternatively, serverless computing allows for workload scalability with minimal developer burden. However, porting a scientific application to serverless is not a straightforward process. In this article, we present a Variant Calling genomics pipeline migrated from single-node HPC to a serverless architecture. We describe the inherent challenges of this approach and the engineering efforts required to achieve scalability. We contribute by open-sourcing the pipeline for future systems research and as a scalable user-friendly tool for the bioinformatics community.
翻译:随着基因组数据复杂性和数据量的不断攀升,生物学机构的HPC能力面临瓶颈。虽然云服务为短期弹性扩展提供了可行方案,但其复杂性给生物信息学用户带来挑战。相比之下,无服务器计算能以最小开发负担实现工作负载的弹性伸缩。然而,将科学应用迁移至无服务器架构并非易事。本文展示了将变异检测基因组管道从单节点HPC迁移至无服务器架构的过程,阐述了该方法的固有挑战及实现可扩展性所需的工程投入。我们通过开源该管道为未来系统研究提供支持,并作为生物信息学社区的可扩展易用工具做出贡献。