Many forensic genetic trace samples are of too low quality to obtain short tandem repeat (STR) DNA profiles as the nuclear DNA they contain is highly degraded (e.g., telogen hairs). Instead, performing shotgun DNA sequencing of such samples can provide valuable information on, e.g., single nucleotide polymorphism (SNP) markers. As a result, shotgun sequencing is starting to gain more attention in forensic genetics and statistical models to correctly interpret such evidence, including properly accounting for sequencing errors, are needed. One such model is the wgsLR model by Andersen et. al. (2025) that enabled evaluating the evidential strength of a comparison between the genotypes in the trace sample and reference sample assuming a single-source contribution to both samples. This paper extends the wgsLR model to allow for different (asymmetric) genotyping error probabilities (e.g., from a low quality trace sample and a high quality reference sample). The model was also extended to be able to handle unknown genotyping error probabilities via a prior distribution. The sensitivity of the wgsLR model against overdispersion was also investigated and it was found that it is robust against it. It was also found that integrating out unknown genotyping error probability of the trace sample gave concordant weight of evidence (WoE) under both the hypotheses (that the same individual was the donor of both trace and reference sample and that two different individuals were the donors of the trace and reference sample). It was found that it is more conservative to use a too small trace sample genotyping error probability rather than a too high genotyping error probability as the latter can explain genotype inconsistencies by errors rather than due to two different individuals being the donors of the trace sample and reference sample. The extensions of the model are implemented in the R package wgsLR.
翻译:许多法医遗传学痕量样本因核DNA高度降解(如休止期毛发)而质量过低,无法获得短串联重复序列(STR)DNA图谱。对此类样本进行散弹枪DNA测序可提供单核苷酸多态性(SNP)标记等有价值信息。因此,散弹枪测序在法医遗传学中日益受到关注,亟需能正确解释此类证据(包括准确考量测序错误)的统计模型。Andersen等人(2025年)提出的wgsLR模型即为此类模型之一,该模型能够在假设痕量样本与参考样本均为单一来源的前提下,评估两者基因型对比的证据强度。本文扩展了wgsLR模型,使其能够处理不同(非对称)的基因分型错误概率(例如来自低质量痕量样本与高质量参考样本)。模型还通过先验分布扩展了处理未知基因分型错误概率的能力。研究同时检验了wgsLR模型对过度离散的敏感性,发现其对此具有稳健性。研究还发现,在两种假设(痕量样本与参考样本来自同一供体,或来自两个不同供体)下,对痕量样本未知基因分型错误概率进行积分处理均能得到一致的证据权重(WoE)。研究表明,使用过低的痕量样本基因分型错误概率比使用过高的错误概率更为保守,因为后者可能将基因型不一致归因于测序错误,而非痕量样本与参考样本来自不同个体。模型的扩展功能已在R软件包wgsLR中实现。