Foundation models are predominantly trained in an unsupervised or self-supervised manner on highly diverse and large-scale datasets, making them broadly applicable to various downstream tasks. In this work, we investigate for the first time whether such models are suitable for the specific domain of face recognition. We further propose and demonstrate the adaptation of these models for face recognition across different levels of data availability. Extensive experiments are conducted on multiple foundation models and datasets of varying scales for training and fine-tuning, with evaluation on a wide range of benchmarks. Our results indicate that, despite their versatility, pre-trained foundation models underperform in face recognition compared to similar architectures trained specifically for this task. However, fine-tuning foundation models yields promising results, often surpassing models trained from scratch when training data is limited. Even with access to large-scale face recognition training datasets, fine-tuned foundation models perform comparably to models trained from scratch, but with lower training computational costs and without relying on the assumption of extensive data availability. Our analysis also explores bias in face recognition, with slightly higher bias observed in some settings when using foundation models.
翻译:基础模型主要通过无监督或自监督方式在高度多样的大规模数据集上进行训练,使其能够广泛适用于各种下游任务。在本研究中,我们首次探讨此类模型是否适用于人脸识别这一特定领域。我们进一步提出并演示了如何在不同数据可用性水平下将这些模型适配至人脸识别任务。我们基于多个基础模型及不同规模的训练与微调数据集进行了大量实验,并在广泛的基准测试中进行了评估。结果表明,尽管基础模型具有通用性,但其在人脸识别任务上的表现仍逊于专门针对该任务训练的同类架构。然而,对基础模型进行微调可取得显著效果:在训练数据有限时,其表现通常优于从头训练的模型;即使能够获取大规模人脸识别训练数据集,微调后的基础模型仍能达到与从头训练模型相当的性能,同时具有更低的训练计算成本,且无需依赖海量数据可用的假设。我们的分析还探讨了人脸识别中的偏差问题,发现在某些设定下使用基础模型时存在略高的偏差。