The issue of distinguishing between the same-source and different-source hypotheses based on various types of traces is a generic problem in forensic science. This problem is often tackled with Bayesian approaches, which are able to provide a likelihood ratio that quantifies the relative strengths of evidence supporting each of the two competing hypotheses. Here, we focus on distance-based approaches, whose robustness and specifically whose capacity to deal with high-dimensional evidence are very different, and need to be evaluated and optimized. A unified framework for direct methods based on estimating the likelihoods of the distance between traces under each of the two competing hypotheses, and indirect methods using logistic regression to discriminate between same-source and different-source distance distributions, is presented. Whilst direct methods are more flexible, indirect methods are more robust and quite natural in machine learning. Moreover, indirect methods also enable the use of a vectorial distance, thus preventing the severe information loss suffered by scalar distance approaches.Direct and indirect methods are compared in terms of sensitivity, specificity and robustness, with and without dimensionality reduction, with and without feature selection, on the example of hand odor profiles, a novel and challenging type of evidence in the field of forensics. Empirical evaluations on a large panel of 534 subjects and their 1690 odor traces show the significant superiority of the indirect methods, especially without dimensionality reduction, be it with or without feature selection.
翻译:区分同源与异源假设的问题是基于不同类型痕迹进行法医学比较的通用问题。该问题通常通过贝叶斯方法解决,该方法能够提供似然比,量化支持两种竞争假设中每种假设的证据相对强度。本文聚焦于基于距离的方法,其稳健性特别是处理高维证据的能力差异显著,需要评估与优化。我们提出了一个统一框架,涵盖基于估计两种竞争假设下痕迹间距离似然的直接方法,以及使用逻辑回归区分同源与异源距离分布的间接方法。直接方法更为灵活,而间接方法更为稳健且更契合机器学习范式。此外,间接方法还能利用向量距离,从而避免标量距离方法造成的信息严重损失。以法医学领域新型且具挑战性的手部气味图谱为例,在有无降维、有无特征选择的条件下,比较了直接与间接方法的灵敏度、特异性和稳健性。基于534名受试者及其1690个气味痕迹的大样本实证评估显示,间接方法显著优于直接方法,特别是在不进行降维的情况下,无论是否进行特征选择。