Most research on facial expression recognition (FER) is conducted in highly controlled environments, but its performance is often unacceptable when applied to real-world situations. This is because when unexpected objects occlude the face, the FER network faces difficulties extracting facial features and accurately predicting facial expressions. Therefore, occluded FER (OFER) is a challenging problem. Previous studies on occlusion-aware FER have typically required fully annotated facial images for training. However, collecting facial images with various occlusions and expression annotations is time-consuming and expensive. Latent-OFER, the proposed method, can detect occlusions, restore occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy. This approach involves three steps: First, the vision transformer (ViT)-based occlusion patch detector masks the occluded position by training only latent vectors from the unoccluded patches using the support vector data description algorithm. Second, the hybrid reconstruction network generates the masking position as a complete image using the ViT and convolutional neural network (CNN). Last, the expression-relevant latent vector extractor retrieves and uses expression-related information from all latent vectors by applying a CNN-based class activation map. This mechanism has a significant advantage in preventing performance degradation from occlusion by unseen objects. The experimental results on several databases demonstrate the superiority of the proposed method over state-of-the-art methods.
翻译:摘要:面部表情识别(FER)研究大多在高度受控的环境中进行,但在实际场景中的应用性能常难以令人满意。这是因为当意外物体遮挡面部时,FER网络难以提取面部特征并准确预测表情。因此,遮挡条件下的FER(OFER)是一项具有挑战性的问题。以往针对遮挡感知的FER研究通常需要完全标注的面部图像进行训练。然而,收集包含各类遮挡及表情标注的面部图像既耗时又昂贵。本文提出的Latent-OFER方法能够检测遮挡、重建被遮挡的面部区域使其呈现无遮挡状态,并完成表情识别,从而提升FER精度。该方法包含三个步骤:首先,基于视觉变换器(ViT)的遮挡补丁检测器利用支持向量数据描述算法,仅通过未遮挡补丁的潜在向量训练,对遮挡位置进行掩码标记;其次,混合重建网络结合ViT与卷积神经网络(CNN)生成完整图像的掩码区域;最后,表情相关潜在向量提取器通过基于CNN的类激活图从所有潜在向量中提取并利用表情相关信息。该机制在防止由未知物体遮挡导致的性能退化方面具有显著优势。在多个数据库上的实验结果表明,所提方法优于当前最先进方法。