Considerations on the Evaluation of Biometric Quality Assessment Algorithms

Quality assessment algorithms can be used to estimate the utility of a biometric sample for the purpose of biometric recognition. "Error versus Discard Characteristic" (EDC) plots, and "partial Area Under Curve" (pAUC) values of curves therein, are generally used by researchers to evaluate the predictive performance of such quality assessment algorithms. An EDC curve depends on an error type such as the "False Non Match Rate" (FNMR), a quality assessment algorithm, a biometric recognition system, a set of comparisons each corresponding to a biometric sample pair, and a comparison score threshold corresponding to a starting error. To compute an EDC curve, comparisons are progressively discarded based on the associated samples' lowest quality scores, and the error is computed for the remaining comparisons. Additionally, a discard fraction limit or range must be selected to compute pAUC values, which can then be used to quantitatively rank quality assessment algorithms. This paper discusses and analyses various details for this kind of quality assessment algorithm evaluation, including general EDC properties, interpretability improvements for pAUC values based on a hard lower error limit and a soft upper error limit, the use of relative instead of discrete rankings, stepwise vs. linear curve interpolation, and normalisation of quality scores to a [0, 100] integer range. We also analyse the stability of quantitative quality assessment algorithm rankings based on pAUC values across varying pAUC discard fraction limits and starting errors, concluding that higher pAUC discard fraction limits should be preferred. The analyses are conducted both with synthetic data and with real data for a face image quality assessment scenario, with a focus on general modality-independent conclusions for EDC evaluations.

翻译：质量评估算法可用于估计生物特征样本在生物特征识别中的效用。研究者通常采用"误差与丢弃特征"(EDC)曲线及其"部分曲线下面积"(pAUC)值来评价这类质量评估算法的预测性能。EDC曲线依赖于误差类型（如"误不匹配率"FNMR）、质量评估算法、生物特征识别系统、每组对应一对生物特征样本的比较结果，以及与初始误差对应的比较分数阈值。在计算EDC曲线时，根据关联样本的最低质量分数逐步丢弃比较结果，并计算剩余比较的误差。此外，必须选择丢弃分数限值或范围来计算pAUC值，从而对质量评估算法进行定量排序。本文讨论并分析了此类质量评估算法评价的多项细节，包括EDC的一般特性、基于硬性下误差限和软性上误差限的pAUC值可解释性改进、采用相对排序而非离散排序、步进式与线性曲线插值的对比，以及将质量分数归一化至[0,100]整数区间的方法。我们还分析了基于pAUC值的定量质量评估算法排序在不同pAUC丢弃分数限值和初始误差条件下的稳定性，结论是应优先采用更高的pAUC丢弃分数限值。分析基于合成数据与人脸图像质量评估场景的真实数据进行，并重点关注适用于EDC评价的通用模态无关结论。