Considerations on the Evaluation of Biometric Quality Assessment Algorithms

Quality assessment algorithms can be used to estimate the utility of a biometric sample for the purpose of biometric recognition. "Error versus Discard Characteristic" (EDC) plots, and "partial Area Under Curve" (pAUC) values of curves therein, are generally used by researchers to evaluate the predictive performance of such quality assessment algorithms. An EDC curve depends on an error type such as the "False Non Match Rate" (FNMR), a quality assessment algorithm, a biometric recognition system, a set of comparisons each corresponding to a biometric sample pair, and a comparison score threshold corresponding to a starting error. To compute an EDC curve, comparisons are progressively discarded based on the associated samples' lowest quality scores, and the error is computed for the remaining comparisons. Additionally, a discard fraction limit or range must be selected to compute pAUC values, which can then be used to quantitatively rank quality assessment algorithms. This paper discusses and analyses various details for this kind of quality assessment algorithm evaluation, including general EDC properties, interpretability improvements for pAUC values based on a hard lower error limit and a soft upper error limit, the use of relative instead of discrete rankings, stepwise vs. linear curve interpolation, and normalisation of quality scores to a [0, 100] integer range. We also analyse the stability of quantitative quality assessment algorithm rankings based on pAUC values across varying pAUC discard fraction limits and starting errors, concluding that higher pAUC discard fraction limits should be preferred. The analyses are conducted both with synthetic data and with real data for a face image quality assessment scenario, with a focus on general modality-independent conclusions for EDC evaluations.

翻译：质量评估算法可用于估计生物特征样本在生物特征识别应用中的可用性。研究人员通常采用“错误率与丢弃率特征”（EDC）曲线及其曲线下的“部分面积”（pAUC）值来评价此类质量评估算法的预测性能。EDC曲线取决于误差类型（如错误非匹配率FNMR）、质量评估算法、生物特征识别系统、对应生物特征样本对的比较集合，以及对应初始误差的比较分数阈值。计算EDC曲线时，基于关联样本的最低质量分数逐步丢弃比较项，并计算剩余比较项的误差。此外，必须选择丢弃分数限制或范围以计算pAUC值，该值可用于对质量评估算法进行定量排序。本文讨论并分析了此类质量评估算法评价的多种细节，包括通用EDC特性、基于硬性下误差限和软性上误差限的pAUC值可解释性改进、使用相对排序替代离散排序、逐步插值与线性插值的差异，以及将质量分数归一化至[0, 100]整数范围。我们还分析了基于pAUC值在不同pAUC丢弃分数限制和初始误差条件下的定量质量评估算法排序稳定性，得出应优先选择更高pAUC丢弃分数限制的结论。通过合成数据与真实数据（针对人脸图像质量评估场景）开展分析，重点关注EDC评价中与模态无关的通用结论。