Illumination variation has been a long-term challenge in real-world facial expression recognition(FER). Under uncontrolled or non-visible light conditions, Near-infrared (NIR) can provide a simple and alternative solution to obtain high-quality images and supplement the geometric and texture details that are missing in the visible domain. Due to the lack of existing large-scale NIR facial expression datasets, directly extending VIS FER methods to the NIR spectrum may be ineffective. Additionally, previous heterogeneous image synthesis methods are restricted by low controllability without prior task knowledge. To tackle these issues, we present the first approach, called for NIR-FER Stochastic Differential Equations (NFER-SDE), that transforms face expression appearance between heterogeneous modalities to the overfitting problem on small-scale NIR data. NFER-SDE is able to take the whole VIS source image as input and, together with domain-specific knowledge, guide the preservation of modality-invariant information in the high-frequency content of the image. Extensive experiments and ablation studies show that NFER-SDE significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.
翻译:光照变化一直是现实场景下面部表情识别(FER)面临的长期挑战。在非受控或不可见光条件下,近红外(NIR)可提供一种简单且替代性的解决方案,以获得高质量图像并补充可见光域缺失的几何与纹理细节。由于缺乏现有的大规模近红外面部表情数据集,直接将可见光域FER方法扩展到近红外光谱可能效果不佳。此外,先前的异质图像合成方法因缺乏先验任务知识而受限于低可控性。为解决这些问题,我们首次提出一种称为NFER-SDE(近红外面部表情识别随机微分方程)的方法,该方法将异质模态间的面部表情外观转换应用于小规模近红外数据的过拟合问题。NFER-SDE能以整个可见光源图像作为输入,并结合领域特定知识,引导保留图像高频内容中的模态不变信息。大量实验和消融研究表明,NFER-SDE显著提升了近红外FER的性能,并在仅有的两个可用近红外FER数据集Oulu-CASIA和Large-HFE上取得了最先进的结果。