We present MeFEm, a vision model based on a modified Joint Embedding Predictive Architecture (JEPA) for biometric and medical analysis from facial images. Key modifications include an axial stripe masking strategy to focus learning on semantically relevant regions, a circular loss weighting scheme, and the probabilistic reassignment of the CLS token for high quality linear probing. Trained on a consolidated dataset of curated images, MeFEm outperforms strong baselines like FaRL and Franca on core anthropometric tasks despite using significantly less data. It also shows promising results on Body Mass Index (BMI) estimation, evaluated on a novel, consolidated closed-source dataset that addresses the domain bias prevalent in existing data. Model weights are available at https://huggingface.co/boretsyury/MeFEm , offering a strong baseline for future work in this domain.
翻译:本文提出MeFEm,一种基于改进型联合嵌入预测架构(JEPA)的视觉模型,用于从面部图像进行生物特征识别与医学分析。关键改进包括:采用轴向条带掩码策略以聚焦于语义相关区域的学习,设计环形损失加权方案,以及对CLS令牌进行概率重分配以实现高质量线性探测。该模型在整合的精选图像数据集上训练,尽管使用数据量显著减少,但在核心人体测量任务上仍优于FaRL、Franca等强基线模型。在体重指数(BMI)估计任务中,该模型在新型整合闭源数据集上亦展现出良好性能,该数据集有效解决了现有数据中普遍存在的领域偏差问题。模型权重发布于https://huggingface.co/boretsyury/MeFEm,为该领域后续研究提供了强有力基准。