With the advance in genome sequencing technology, the lengths of deoxyribonucleic acid (DNA) sequencing results are rapidly increasing at lower prices than ever. However, the longer lengths come at the cost of a heavy computational burden on aligning them. For example, aligning sequences to a human reference genome can take tens or even hundreds of hours. The current de facto standard approach for alignment is based on the guided dynamic programming method. Although this takes a long time and could potentially benefit from high-throughput graphic processing units (GPUs), the existing GPU-accelerated approaches often compromise the algorithm's structure, due to the GPU-unfriendly nature of the computational pattern. Unfortunately, such compromise in the algorithm is not tolerable in the field, because sequence alignment is a part of complicated bioinformatics analysis pipelines. In such circumstances, we propose AGAThA, an exact and efficient GPU-based acceleration of guided sequence alignment. We diagnose and address the problems of the algorithm being unfriendly to GPUs, which comprises strided/redundant memory accesses and workload imbalances that are difficult to predict. According to the experiments on modern GPUs, AGAThA achieves 18.8$\times$ speedup against the CPU-based baseline, 9.6$\times$ against the best GPU-based baseline, and 3.6$\times$ against GPU-based algorithms with different heuristics.
翻译:随着基因组测序技术的进步,脱氧核糖核酸(DNA)测序结果的长度以前所未有的低价格快速增长。然而,更长的读段长度导致比对计算负担显著加重——例如,将序列比对到人类参考基因组可能需要数十甚至数百小时。当前实际标准比对方法基于引导动态规划技术。尽管该方法耗时较长且有望受益于高吞吐量图形处理器(GPU),但由于其计算模式对GPU不友好,现有GPU加速方案往往不得不折衷算法结构。然而,这种算法折衷在生物信息学领域是不可接受的——因为序列比对是复杂生物信息学分析流程的组成部分。针对这一困境,我们提出AGAThA——一种精确高效的基于GPU的引导序列比对加速方案。我们诊断并解决了该算法对GPU不友好的核心问题,包括跨步/冗余内存访问以及难以预测的工作负载不均衡。在现代GPU上的实验表明,AGAThA相比CPU基线实现18.8倍加速,相比最优GPU基线实现9.6倍加速,相比采用不同启发式策略的GPU算法实现3.6倍加速。