Genome analysis has revolutionized fields such as personalized medicine and forensics. Modern sequencing machines generate vast amounts of fragmented strings of genome data called reads. The alignment of these reads into a complete DNA sequence of an organism (the read mapping process) requires extensive data transfer between processing units and memory, leading to execution bottlenecks. Prior studies have primarily focused on accelerating specific stages of the read-mapping task. Conversely, this paper introduces a holistic framework called DART-PIM that accelerates the entire read-mapping process. DART-PIM facilitates digital processing-in-memory (PIM) for an end-to-end acceleration of the entire read-mapping process, from indexing using a unique data organization schema to filtering and read alignment with an optimized Wagner Fischer algorithm. A comprehensive performance evaluation with real genomic data shows that DART-PIM achieves a 5.7x and 257x improvement in throughput and a 92x and 27x energy efficiency enhancement compared to state-of-the-art GPU and PIM implementations, respectively.
翻译:基因组分析已彻底改变了个性化医疗与法医学等领域。现代测序设备产生大量被称为读取序列的基因组数据碎片化字符串。将这些读取序列比对至生物体完整DNA序列(即读取映射过程)需要在处理单元与存储器之间进行大量数据传输,导致执行瓶颈。先前研究主要聚焦于加速读取映射任务中的特定阶段。相比之下,本文提出了一种名为DART-PIM的整体性框架,可加速完整的读取映射流程。DART-PIM通过独特的数据组织架构实现索引建立,结合优化的Wagner Fischer算法进行过滤与读取比对,从而为端到端的完整读取映射过程提供数字内存内处理(PIM)支持。基于真实基因组数据的综合性能评估表明:相较于最先进的GPU与PIM实施方案,DART-PIM分别实现了5.7倍与257倍的吞吐量提升,以及92倍与27倍的能效增强。