Face morphing attacks threaten biometric verification, yet most morphing attack detection (MAD) systems require task-specific training and generalize poorly to unseen attack types. Meanwhile, open-source multimodal large language models (MLLMs) have demonstrated strong visual-linguistic reasoning, but their potential in biometric forensics remains underexplored. In this paper, we present the first systematic zero-shot evaluation of open-source MLLMs for single-image MAD, using publicly available weights and a standardized, reproducible protocol. Across diverse morphing techniques, many MLLMs show non-trivial discriminative ability without any fine-tuning or domain adaptation, and LLaVA1.6-Mistral-7B achieves state-of-the-art performance, surpassing highly competitive task-specific MAD baselines by at least 23% in terms of equal error rate (EER). The results indicate that multimodal pretraining can implicitly encode fine-grained facial inconsistencies indicative of morphing artifacts, enabling zero-shot forensic sensitivity. Our findings position open-source MLLMs as reproducible, interpretable, and competitive foundations for biometric security and forensic image analysis. This emergent capability also highlights new opportunities to develop state-of-the-art MAD systems through targeted fine-tuning or lightweight adaptation, further improving accuracy and efficiency while preserving interpretability. To support future research, all code and evaluation protocols will be released upon publication.
翻译:面部形态攻击对生物特征验证构成威胁,然而大多数形态攻击检测系统需要针对特定任务进行训练,且对未见攻击类型的泛化能力较差。与此同时,开源多模态大语言模型已展现出强大的视觉-语言推理能力,但其在生物特征取证领域的潜力仍未得到充分探索。本文首次对开源MLLM在单图像形态攻击检测任务上进行了系统性零样本评估,采用公开可用的模型权重及标准化、可复现的评估协议。在多种形态生成技术下,许多MLLM在未经任何微调或领域适配的情况下展现出显著的判别能力,其中LLaVA1.6-Mistral-7B实现了最先进的性能,在等错误率指标上至少超越当前极具竞争力的任务专用形态攻击检测基线23%。结果表明,多模态预训练能够隐式编码指示形态篡改痕迹的细粒度面部不一致特征,从而实现零样本取证敏感性。我们的研究发现将开源MLLM定位为生物特征安全与取证图像分析领域中可复现、可解释且具有竞争力的基础框架。这种涌现能力也揭示了通过定向微调或轻量级适配开发先进形态攻击检测系统的新机遇,可在保持可解释性的同时进一步提升准确性与效率。为支持后续研究,所有代码与评估协议将在论文发表时开源发布。