Many forensic genetic trace samples are of too low quality to obtain short tandem repeat (STR) DNA profiles as the nuclear DNA they contain is highly degraded (e.g., telogen hairs). Instead, performing shotgun DNA sequencing of such samples can provide valuable information on, e.g., single nucleotide polymorphism (SNP) markers. As a result, shotgun sequencing is starting to gain more attention in forensic genetics and statistical models to correctly interpret such evidence, including properly accounting for sequencing errors, are needed. One such model is the wgsLR model by Andersen et. al. (2025) that enabled evaluating the evidential strength of a comparison between the genotypes in the trace sample and reference sample assuming a single-source contribution to both samples. This paper extends the wgsLR model to allow for different (asymmetric) genotyping error probabilities (e.g., from a low quality trace sample and a high quality reference sample). The model was also extended to handle unknown genotyping error probabilities via both maximising profile likelihood and using a prior distribution. The sensitivity of the wgsLR model against overdispersion was also investigated and it was found robust against it. It was also found that handling an unknown genotyping error probability of the trace sample with the methods having a sufficient number of independent markers gave concordant weight of evidence (WoE) under both the hypotheses (same or different individuals being donors of trace and reference sample). It was found more conservative to use a too small trace sample genotyping error probability rather than a too high genotyping error probability as the latter can explain genotype inconsistencies by errors rather than due to two different individuals being the donors of the trace sample and reference sample. The extensions of the model are implemented in the R package wgsLR.


翻译:许多法医遗传学微量样本因核DNA高度降解(如休止期毛发)而质量过低,无法获得短串联重复序列(STR)DNA图谱。对此类样本进行霰弹枪DNA测序可提供有价值的信息,例如单核苷酸多态性(SNP)标记。因此,霰弹枪测序在法医遗传学中正受到更多关注,需要建立统计模型以正确解释此类证据,包括妥善处理测序误差。Andersen等人(2025)提出的wgsLR模型即为一例,该模型在假设痕量样本与参考样本均为单一来源贡献的前提下,能够评估两者基因型比较的证据强度。本文扩展了wgsLR模型,允许使用不同(非对称)的基因分型错误概率(例如来自低质量痕量样本与高质量参考样本)。模型还通过最大化轮廓似然和使用先验分布两种方式,扩展至处理未知基因分型错误概率。研究同时考察了wgsLR模型对过度离散的敏感性,发现其对此具有稳健性。结果表明,在拥有足够数量独立标记位点的条件下,采用所提方法处理痕量样本的未知基因分型错误概率时,两种假设(痕量样本与参考样本的供体为同一人或不同人)下的证据权重(WoE)结果一致。研究发现,使用过低的痕量样本基因分型错误概率比使用过高的错误概率更为保守,因为后者可能将基因型不一致归因于测序错误,而非痕量样本与参考样本供体不同。模型扩展功能已在R包wgsLR中实现。

0
下载
关闭预览

相关内容

【ICML2023】序列多维自监督学习的临床时间序列建模
专知会员服务
23+阅读 · 2023年7月22日
专知会员服务
34+阅读 · 2021年8月16日
专知会员服务
109+阅读 · 2020年5月21日
AAAI 2022 | ProtGNN:自解释图神经网络
专知
10+阅读 · 2022年2月28日
国家自然科学基金
0+阅读 · 2015年12月31日
国家自然科学基金
0+阅读 · 2015年12月31日
国家自然科学基金
0+阅读 · 2014年12月31日
VIP会员
相关VIP内容
【ICML2023】序列多维自监督学习的临床时间序列建模
专知会员服务
23+阅读 · 2023年7月22日
专知会员服务
34+阅读 · 2021年8月16日
专知会员服务
109+阅读 · 2020年5月21日
相关基金
Top
微信扫码咨询专知VIP会员