Genome analysis has revolutionized fields such as personalized medicine and forensics. Modern sequencing machines generate vast amounts of fragmented strings of genome data called reads. The alignment of these reads into a complete DNA sequence of an organism (the read mapping process) requires extensive data transfer between processing units and memory, leading to execution bottlenecks. Prior studies have primarily focused on accelerating specific stages of the read-mapping task. Conversely, this paper introduces a holistic framework called DART-PIM that accelerates the entire read-mapping process. DART-PIM facilitates digital processing-in-memory (PIM) for an end-to-end acceleration of the entire read-mapping process, from indexing using a unique data organization schema to filtering and read alignment with an optimized Wagner Fischer algorithm. A comprehensive performance evaluation with real genomic data shows that DART-PIM achieves a 5.7x and 257x improvement in throughput and a 92x and 27x energy efficiency enhancement compared to state-of-the-art GPU and PIM implementations, respectively.
翻译:基因组分析已彻底改变了个性化医疗与法医学等领域。现代测序设备会生成大量称为读取的基因组数据片段化字符串。将这些读取对齐为生物体的完整DNA序列(即读取映射过程)需要在处理单元与内存之间进行大量数据传输,从而导致执行瓶颈。先前的研究主要集中于加速读取映射任务的特定阶段。相比之下,本文提出了一种名为DART-PIM的整体性框架,旨在加速整个读取映射流程。DART-PIM通过独特的数椐组织架构实现索引,并采用优化的Wagner Fischer算法进行过滤与读取对齐,从而支持数字存内计算(PIM)以实现端到端的全流程加速。基于真实基因组数据的综合性能评估表明,相较于最先进的GPU和PIM实施方案,DART-PIM在吞吐量上分别实现了5.7倍和257倍的提升,在能效上分别获得了92倍和27倍的增强。