This paper introduces LLDif, a novel diffusion-based facial expression recognition (FER) framework tailored for extremely low-light (LL) environments. Images captured under such conditions often suffer from low brightness and significantly reduced contrast, presenting challenges to conventional methods. These challenges include poor image quality that can significantly reduce the accuracy of emotion recognition. LLDif addresses these issues with a novel two-stage training process that combines a Label-aware CLIP (LA-CLIP), an embedding prior network (PNET), and a transformer-based network adept at handling the noise of low-light images. The first stage involves LA-CLIP generating a joint embedding prior distribution (EPD) to guide the LLformer in label recovery. In the second stage, the diffusion model (DM) refines the EPD inference, ultilising the compactness of EPD for precise predictions. Experimental evaluations on various LL-FER datasets have shown that LLDif achieves competitive performance, underscoring its potential to enhance FER applications in challenging lighting conditions.
翻译:本文提出了LLDif,一种专为极低光照环境设计的新型基于扩散的面部表情识别框架。在此类条件下捕获的图像通常存在亮度不足和对比度显著降低的问题,这对传统方法构成了挑战。这些挑战包括图像质量差,可能显著降低情感识别的准确性。LLDif通过一种新颖的两阶段训练流程来解决这些问题,该流程结合了标签感知CLIP、嵌入先验网络以及一个擅长处理低光照图像噪声的基于Transformer的网络。第一阶段,LA-CLIP生成联合嵌入先验分布,以指导LLformer进行标签恢复。第二阶段,扩散模型利用EPD的紧凑性,对EPD推断进行细化,以实现精确预测。在多个LL-FER数据集上的实验评估表明,LLDif取得了具有竞争力的性能,凸显了其在挑战性光照条件下增强FER应用的潜力。