Automated face recognition has made rapid strides over the past decade due to the unprecedented rise of deep neural network (DNN) models that can be trained for domain-specific tasks. At the same time, foundation models that are pretrained on broad vision or vision-language tasks have shown impressive generalization across diverse domains, including biometrics. This raises an important question: Do different DNN models--both domain-specific and foundation models--encode facial identity in similar ways, despite being trained on different datasets, loss functions, and architectures? In this regard, we directly analyze the geometric structure of embedding spaces imputed by different DNN models. Treating embeddings of face images as point clouds, we study whether simple affine transformations can align face representations of one model with another. Our findings reveal surprising cross-model compatibility: low-capacity linear mappings substantially improve cross-model face recognition over unaligned baselines for both face identification and verification tasks. Alignment patterns generalize across datasets and vary systematically across model families, indicating representational convergence in facial identity encoding. These findings have implications for model interoperability, ensemble design, and biometric template security.
翻译:过去十年中,由于能够针对特定领域任务进行训练的深度神经网络模型空前崛起,自动人脸识别取得了飞速发展。同时,在广泛的视觉或视觉-语言任务上预训练的基础模型,在包括生物特征识别在内的多个领域展现出了卓越的泛化能力。这引发了一个重要问题:不同的深度神经网络模型——无论是特定领域模型还是基础模型——尽管基于不同的数据集、损失函数和架构进行训练,它们编码面部身份信息的方式是否相似?针对此问题,我们直接分析了不同深度神经网络模型所提供的嵌入空间的几何结构。将人脸图像的嵌入视为点云,我们研究简单的仿射变换能否将一个模型的人脸表示与另一个模型对齐。我们的发现揭示了惊人的跨模型兼容性:在面部识别和验证任务中,低容量的线性映射显著提升了跨模型的面部识别性能,超越了未对齐的基线方法。对齐模式在不同数据集上具有泛化性,并依模型系列呈现系统性差异,这表明面部身份编码存在表征趋同现象。这些发现对模型互操作性、集成设计以及生物特征模板安全具有重要启示。