We study the problem of estimating the mutation rate between two sequences from noisy sequencing reads. Existing alignment-free methods typically assume direct access to the full sequences. We extend these methods to the sequencing framework, where only noisy reads from the sequences are observed. We use a simple model in which both mutations and sequencing errors are substitutions. We propose multiple estimators, provide theoretical guarantees for one of them, and evaluate the others through simulations.
翻译:我们研究了从含噪声的测序读数中估计两条序列间突变率的问题。现有的无对齐方法通常假设可直接获取完整序列。我们将这些方法扩展到测序框架下,其中仅能观测到来自序列的含噪声读数。我们采用一个简化模型,其中突变和测序错误均表现为替换。我们提出了多种估计量,为其中一种提供了理论保证,并通过仿真实验评估了其他估计量的性能。