The translation of comparative genomics into clinical decision support tools often depends on the quality of sequence alignments. However, currently used methods of multiple sequence alignments suffer from significant biases and problems with aligning diverged sequences. The objective of this study was to develop and test a new multiple sequence alignment (MSA) algorithm suitable for the high-throughput comparative analysis of different microbial genomes. This algorithm employs an innovative tensor indexing method for partitioning the dynamic programming hyper-cube space for parallel processing. We have used the clinically relevant task of identifying regions that determine resistance to antibiotics to test the new algorithm and to compare its performance with existing MSA methods. The new method "mmDst" performed better than existing MSA algorithms for more divergent sequences because it employs a simultaneous alignment scoring recurrence, which effectively approximated the score for edge missing cell scores that fall outside the scoring region.
翻译:将比较基因组学转化为临床决策支持工具通常依赖于序列比对的质量。然而,目前使用的多序列比对方法存在显著偏差,且在比对分歧序列方面存在问题。本研究旨在开发并测试一种适用于不同微生物基因组高通量比较分析的新型多序列比对算法。该算法采用创新的张量索引方法,将动态规划超立方体空间分区以实现并行处理。我们利用识别抗生素耐药性决定区域的临床相关任务,测试了新算法并将其性能与现有MSA方法进行比较。新方法"mmDst"在比对更多分歧序列方面优于现有MSA算法,因为它采用了同步比对评分递归,有效近似了位于评分区域之外的边缘缺失单元格评分。