Pairwise sequence alignment is a very time-consuming step in common bioinformatics pipelines. Speeding up this step requires heuristics, efficient implementations, and/or hardware acceleration. A promising candidate for all of the above is the recently proposed GenASM algorithm. We identify and address three inefficiencies in the GenASM algorithm: it has a high amount of data movement, a large memory footprint, and does some unnecessary work. We propose Scrooge, a fast and memory-frugal genomic sequence aligner. Scrooge includes three novel algorithmic improvements which reduce the data movement, memory footprint, and the number of operations in the GenASM algorithm. We provide efficient open-source implementations of the Scrooge algorithm for CPUs and GPUs, which demonstrate the significant benefits of our algorithmic improvements. For long reads, the CPU version of Scrooge achieves a 20.1x, 1.7x, and 2.1x speedup over KSW2, Edlib, and a CPU implementation of GenASM, respectively. The GPU version of Scrooge achieves a 4.0x 80.4x, 6.8x, 12.6x and 5.9x speedup over the CPU version of Scrooge, KSW2, Edlib, Darwin-GPU, and a GPU implementation of GenASM, respectively. We estimate an ASIC implementation of Scrooge to use 3.6x less chip area and 2.1x less power than a GenASM ASIC while maintaining the same throughput. Further, we systematically analyze the throughput and accuracy behavior of GenASM and Scrooge under various configurations. As the best configuration of Scrooge depends on the computing platform, we make several observations that can help guide future implementations of Scrooge. Availability: https://github.com/CMU-SAFARI/Scrooge
翻译:成对序列比对是常见生物信息学流程中非常耗时的步骤。加速此步骤需要借助启发式方法、高效实现和/或硬件加速。近期提出的GenASM算法是兼顾上述所有方向的理想候选方案。我们识别并解决了GenASM算法中的三个低效问题:数据移动量大、内存占用高以及存在不必要的计算。我们提出Scrooge——一种快速且内存高效的基因组序列比对器。Scrooge包含三项创新算法改进,可减少GenASM算法中的数据移动、内存占用及操作数量。我们为CPU和GPU提供了Scrooge算法的开源高效实现,验证了算法改进的显著优势。针对长读长数据,Scrooge的CPU版本相比KSW2、Edlib及GenASM的CPU实现分别实现20.1倍、1.7倍和2.1倍加速。Scrooge的GPU版本相比其CPU版本、KSW2、Edlib、Darwin-GPU及GenASM的GPU实现分别实现4.0倍、80.4倍、6.8倍、12.6倍和5.9倍加速。我们估算,在保持相同吞吐量的前提下,Scrooge的ASIC实现相比GenASM ASIC可减少3.6倍芯片面积和2.1倍功耗。此外,我们系统分析了GenASM与Scrooge在不同配置下的吞吐量和准确性行为。鉴于Scrooge的最优配置取决于计算平台,我们提出了若干观察结论以指导未来Scrooge的实现。可用性:https://github.com/CMU-SAFARI/Scrooge