Considerations on the Evaluation of Biometric Quality Assessment Algorithms

Quality assessment algorithms can be used to estimate the utility of a biometric sample for the purpose of biometric recognition. "Error versus Discard Characteristic" (EDC) plots, and "partial Area Under Curve" (pAUC) values of curves therein, are generally used by researchers to evaluate the predictive performance of such quality assessment algorithms. An EDC curve depends on an error type such as the "False Non Match Rate" (FNMR), a quality assessment algorithm, a biometric recognition system, a set of comparisons each corresponding to a biometric sample pair, and a comparison score threshold corresponding to a starting error. To compute an EDC curve, comparisons are progressively discarded based on the associated samples' lowest quality scores, and the error is computed for the remaining comparisons. Additionally, a discard fraction limit or range must be selected to compute pAUC values, which can then be used to quantitatively rank quality assessment algorithms. This paper discusses and analyses various details for this kind of quality assessment algorithm evaluation, including general EDC properties, interpretability improvements for pAUC values based on a hard lower error limit and a soft upper error limit, the use of relative instead of discrete rankings, stepwise vs. linear curve interpolation, and normalisation of quality scores to a [0, 100] integer range. We also analyse the stability of quantitative quality assessment algorithm rankings based on pAUC values across varying pAUC discard fraction limits and starting errors, concluding that higher pAUC discard fraction limits should be preferred. The analyses are conducted both with synthetic data and with real face image and fingerprint data, with a focus on general modality-independent conclusions for EDC evaluations. Various EDC alternatives are discussed as well.

翻译：质量评估算法可用于估计生物特征样本在生物特征识别中的效用。研究人员通常采用“错误率与丢弃特征”（EDC）曲线及其中的“曲线下部分面积”（pAUC）值来评估此类质量评估算法的预测性能。EDC曲线取决于错误类型（如“误匹配率”FNMR）、质量评估算法、生物特征识别系统、每个对应一对生物特征样本的比较集合，以及对应初始错误的比较得分阈值。计算EDC曲线时，根据关联样本的最低质量得分逐步丢弃比较，并计算剩余比较的错误率。此外，需选择丢弃分数限制或范围以计算pAUC值，进而对质量评估算法进行定量排名。本文讨论并分析了此类质量评估算法评价的多个细节，包括EDC的一般特性、基于硬性下限错误率和软性上限错误率改进pAUC值可解释性、使用相对排名而非离散排名、逐步插值与线性插值对比，以及将质量得分归一化为[0,100]整数范围。我们还分析了基于pAUC值的定量质量评估算法排名在不同pAUC丢弃分数限制和初始错误率下的稳定性，得出应优先选择较高pAUC丢弃分数限制的结论。分析基于合成数据以及真实人脸图像和指纹数据展开，重点聚焦与模态无关的EDC评价通用结论，并对多种EDC替代方案进行了讨论。