The advent of next-generation sequencing (NGS) has revolutionized genomic research by enabling cost-effective, high-throughput sequencing of a diverse range of organisms. This breakthrough has unleashed a "Cambrian explosion" in genomic data volume and diversity. This volume of workloads places genomics among the top four big data challenges anticipated for this decade. In this context, pairwise sequence alignment represents a very time- and energy-intensive step in common bioinformatics pipelines. Speeding up this step requires the implementation of heuristic approaches, optimized algorithms, and/or hardware acceleration. Although state-of-the-art CPU and GPU implementations have demonstrated significant performance gains, recent FPGA implementations have shown improved energy efficiency. However, the latter often suffer from limited read-length scalability due to constraints on hardware resources when aligning longer sequences. In this work, we present a flexible FPGA-based accelerator template scalable up to 1000 bp that implements Myers's algorithm to compute exact unit-cost edit-distance using high-level synthesis and a worker-based architecture. GeneTEK, a set of instances of this accelerator template in a Xilinx Zynq UltraScale+ FPGA, achieves up to 113% increase in execution speed and up to 111x reduction in energy consumption compared to leading CPU and GPU solutions, while fitting comparison matrices up to 13x larger than previous FPGA-based systolic-array solutions. By following a SW-HW co-design approach, GeneTEK exploits parallelization at multiple levels and efficient memory use to deliver a scalable and accurate FPGA-based accelerator. These results reaffirm the potential of FPGAs as an energy-efficient platform for pairwise alignment of read-lengths up to 1000 bp.
翻译:摘要:新一代测序技术的出现通过实现多种生物体的经济高效、高通量测序,彻底改变了基因组学研究。这一突破引发了基因组数据量和多样性的"寒武纪大爆发"。本十年间,此类工作负载使基因组学位列前四大大数据挑战之一。在此背景下,成对序列比对成为常见生物信息学流程中高度耗时且耗能的步骤。加速该步骤需要采用启发式方法、优化算法和/或硬件加速。尽管先进的CPU和GPU实现已展现出显著的性能提升,但近期FPGA实现表现出更优的能效。然而,后者在比对较长序列时,常因硬件资源限制而面临读长可扩展性不足的问题。本文提出一种基于FPGA的灵活加速器模板,可扩展至1000 bp,通过高层次综合和基于工作单元的架构实现Myers算法以计算精确单位成本编辑距离。GeneTEK作为该加速器模板在Xilinx Zynq UltraScale+ FPGA上的一组实例,与领先的CPU和GPU方案相比,执行速度提升高达113%,能耗降低高达111倍,同时可容纳的比较矩阵规模比此前基于FPGA的脉动阵列方案大13倍。通过软硬件协同设计方法,GeneTEK利用多级并行化和高效内存使用,提供了可扩展且精确的FPGA加速器。这些结果再次验证了FPGA作为能效平台在实现长达1000 bp读长成对比对方面的潜力。