This paper describes a new asynchronous algorithm and implementation for the problem of k-mer counting (KC), which concerns quantifying the frequency of length k substrings in a DNA sequence. This operation is common to many computational biology workloads and can take up to 77% of the total runtime of de novo genome assembly. The performance and scalability of the current state-of-the-art distributed-memory KC algorithm are hampered by multiple rounds of Many-To-Many collectives. Therefore, we develop an asynchronous algorithm (DAKC) that uses fine-grained, asynchronous messages to obviate most of this global communication while utilizing network bandwidth efficiently via custom message aggregation protocols. DAKC can perform strong scaling up to 256 nodes (512 sockets / 6K cores) and can count k-mers up to 9x faster than the state-of-the-art distributed-memory algorithm, and up to 100x faster than the shared-memory alternative. We also provide an analytical model to understand the hardware resource utilization of our asynchronous KC algorithm and provide insights on the performance.
翻译:本文提出了一种针对k-mer计数(KC)问题的新型异步算法与实现,该问题涉及统计DNA序列中长度为k的子串出现频率。该操作常见于众多计算生物学工作负载,在从头基因组组装中可占据总运行时间的77%。当前最先进的分布式内存KC算法因多轮多对多集合通信而制约了其性能与可扩展性。为此,我们开发了一种异步算法(DAKC),通过细粒度异步消息传递消除大部分全局通信,同时利用定制消息聚合协议高效利用网络带宽。DAKC在256个节点(512个插槽/6K核心)上可实现强可扩展性,其k-mer计数速度比最先进的分布式内存算法快达9倍,比共享内存方案快达100倍。我们还建立了分析模型以理解异步KC算法的硬件资源利用率,并对性能表现提供了深入解析。