Comparing different age estimation methods poses a challenge due to the unreliability of published results stemming from inconsistencies in the benchmarking process. Previous studies have reported continuous performance improvements over the past decade using specialized methods; however, our findings challenge these claims. This paper identifies two trivial, yet persistent issues with the currently used evaluation protocol and describes how to resolve them. We describe our evaluation protocol in detail and provide specific examples of how the protocol should be used. We utilize the protocol to offer an extensive comparative analysis for state-of-the-art facial age estimation methods. Surprisingly, we find that the performance differences between the methods are negligible compared to the effect of other factors, such as facial alignment, facial coverage, image resolution, model architecture, or the amount of data used for pretraining. We use the gained insights to propose using FaRL as the backbone model and demonstrate its efficiency. The results emphasize the importance of consistent data preprocessing practices for reliable and meaningful comparisons. We make our source code public at https://github.com/paplhjak/Facial-Age-Estimation-Benchmark.
翻译:比较不同年龄估计方法存在挑战,因为已发表结果因基准测试流程的不一致性而不可靠。以往研究报道了在过去十年中通过专门方法实现的持续性能改进;然而,我们的发现对这些论断提出了质疑。本文指出了当前评估协议中两个微小但长期存在的问题,并描述了如何解决这些问题。我们详细描述了评估协议,并提供了具体示例说明协议应如何使用。我们利用该协议对最先进的面部年龄估计方法进行了广泛的比较分析。令人惊讶的是,我们发现方法之间的性能差异与面部对齐、面部覆盖范围、图像分辨率、模型架构或预训练所用数据量等其他因素的影响相比微乎其微。我们利用这些见解提出使用FaRL作为基础模型,并展示了其效率。结果强调了一致的数据预处理实践对于可靠且有意义的比较的重要性。我们在https://github.com/paplhjak/Facial-Age-Estimation-Benchmark 公开了源代码。