ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

Can Firtina,Kamlesh Pillai,Gurpreet S. Kalsi,Bharathwaj Suresh,Damla Senol Cali,Jeremie Kim,Taha Shahroodi,Meryem Banu Cavlak,Joel Lindegger,Mohammed Alser,Juan Gómez Luna,Sreenivas Subramoney,Onur Mutlu

from arxiv, Accepted to ACM TACO

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. We identify an urgent need for a flexible, high-performance, and energy-efficient HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM tackles the major inefficiencies in the Baum-Welch algorithm by 1) designing flexible hardware to accommodate various pHMM designs, 2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, 3) rapidly filtering out negligible computations using a hardware-based filter, and 4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55x - 260.03x, 1.83x - 5.34x, and 27.97x when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x - 59.94x, 1.03x - 1.75x, and 1.03x - 1.95x, respectively, while improving their energy efficiency by 64.24x - 115.46x, 1.75x, 1.96x.

翻译：剖面隐马尔可夫模型（pHMMs）被广泛应用于各类生物信息学应用中，用于识别生物序列（如DNA或蛋白质序列）之间的相似性。在pHMMs中，序列以图结构表示。这些概率随后用于计算序列与pHMM图之间的相似性得分。Baum-Welch算法作为一种普遍采用且高度精确的方法，利用这些概率来优化并计算相似性得分。然而，Baum-Welch算法计算密集，现有解决方案仅提供采用固定pHMM设计的纯软件或纯硬件方法。我们认识到对灵活、高性能且节能的软硬件协同设计的迫切需求，以解决pHMMs中Baum-Welch算法的主要低效问题。我们提出ApHMM，这是首个旨在显著降低pHMMs的Baum-Welch算法相关计算和能量开销的灵活加速框架。ApHMM通过以下方式解决Baum-Welch算法的主要低效问题：1）设计灵活硬件以支持多种pHMM设计；2）通过采用记忆化技术的片上存储器利用可预测的数据依赖模式；3）使用基于硬件的过滤器快速过滤可忽略的计算；4）最小化冗余计算。与Baum-Welch算法的CPU、GPU和FPGA实现相比，ApHMM分别实现了15.55倍至260.03倍、1.83倍至5.34倍和27.97倍的显著加速。在三个关键生物信息学应用中——1）纠错、2）蛋白质家族搜索和3）多序列比对——ApHMM分别比最先进的CPU实现快1.29倍至59.94倍、1.03倍至1.75倍和1.03倍至1.95倍，同时将能量效率分别提升64.24倍至115.46倍、1.75倍和1.96倍。