De Bruijn graph is one of the most important data structures used in de-novo genome assembly algorithms, especially for NGS data. There is a growing need for parallel data structures and algorithms due to the increasing number of cores in modern computers. The assembly task is an indispensable step in sequencing genomes of new organisms and studying structural genomic changes. In recent years, the dynamic development of next-generation sequencing (NGS) methods raises hopes for making whole-genome sequencing a fast and reliable tool used, for example, in medical diagnostics. However, this is hampered by the slowness and computational requirements of the current processing algorithms, which raises the need to develop more efficient algorithms. One possible approach, still little explored, is the use of quantum computing. We created the lock-free version of the de Bruijn graph, as well as a lock-free algorithm to build such graph from reads. Our algorithm and data structures are developed to use parallel threads of execution and do not use mutexes or other locking mechanisms, instead, we used only compare-and-swap instruction and other atomic operations. It makes our algorithm very fast and efficiently scaling. The presented article depicts the new lock-free de Bruijn graph data structure with a graph build algorithm. We developed a C++ library and tested its performance to depict its high speed and scalability compared to other available tools.
翻译:De Bruijn图是新基因组组装算法中最重要的数据结构之一,尤其适用于下一代测序(NGS)数据。随着现代计算机核心数量的增加,对并行数据结构和算法的需求日益增长。基因组组装是对新物种进行全基因组测序及研究结构基因组变异不可或缺的步骤。近年来,下一代测序(NGS)方法的快速发展使全基因组测序有望成为医疗诊断等领域中快速可靠的工具。然而,当前处理算法的低效性和高计算需求阻碍了这一进程,亟需开发更高效的算法。量子计算作为一种尚未充分探索的方法,具有潜在应用价值。本文提出了无锁版de Bruijn图及其基于读段构建的无锁算法。我们的算法和数据结构采用并行执行线程,不使用互斥锁或其他锁定机制,仅依赖比较并交换(compare-and-swap)指令及其他原子操作。这使得算法具有极快的运行速度和良好的可扩展性。本文展示了新型无锁de Bruijn图数据结构及相应的图构建算法。我们开发了C++库并对其性能进行测试,结果表明该算法相较于现有工具具有更快的速度和更高的可扩展性。