Comparing different age estimation methods poses a challenge due to the unreliability of published results, stemming from inconsistencies in the benchmarking process. Previous studies have reported continuous performance improvements over the past decade using specialized methods; however, our findings challenge these claims. We argue that, for age estimation tasks outside of the low-data regime, designing specialized methods is unnecessary, and the standard approach of utilizing cross-entropy loss is sufficient. This paper aims to address the benchmark shortcomings by evaluating state-of-the-art age estimation methods in a unified and comparable setting. We systematically analyze the impact of various factors, including facial alignment, facial coverage, image resolution, image representation, model architecture, and the amount of data on age estimation results. Surprisingly, these factors often exert a more significant influence than the choice of the age estimation method itself. We assess the generalization capability of each method by evaluating the cross-dataset performance for publicly available age estimation datasets. The results emphasize the importance of using consistent data preprocessing practices and establishing standardized benchmarks to ensure reliable and meaningful comparisons. The source code is available at https://github.com/paplhjak/Facial-Age-Estimation-Benchmark.
翻译:比较不同年龄估计方法具有挑战性,原因在于已发表结果因基准测试流程不一致而不可靠。先前研究报告了过去十年间使用专门方法持续提升的性能,但我们的发现对此提出质疑。我们认为,对于低数据量场景之外的年龄估计任务,设计专门方法并无必要,标准交叉熵损失函数的使用已足够。本文旨在通过在统一可比的设置下评估最先进的年龄估计方法,解决基准测试的缺陷。我们系统分析了面部对齐、面部覆盖范围、图像分辨率、图像表示、模型架构以及数据量等因素对年龄估计结果的影响。令人惊讶的是,这些因素往往比年龄估计方法本身的选择影响更大。我们通过评估公开年龄估计数据集的跨数据集性能,检验了每种方法的泛化能力。结果强调了使用一致的数据预处理流程和建立标准化基准对于确保可靠且有意义的比较的重要性。源代码可从https://github.com/paplhjak/Facial-Age-Estimation-Benchmark获取。