Facial expressions convey massive information and play a crucial role in emotional expression. Deep neural network (DNN) accompanied by deep metric learning (DML) techniques boost the discriminative ability of the model in facial expression recognition (FER) applications. DNN, equipped with only classification loss functions such as Cross-Entropy cannot compact intra-class feature variation or separate inter-class feature distance as well as when it gets fortified by a DML supporting loss item. The triplet center loss (TCL) function is applied on all dimensions of the sample's embedding in the embedding space. In our work, we developed three strategies: fully-synthesized, semi-synthesized, and prediction-based negative sample selection strategies. To achieve better results, we introduce a selective attention module that provides a combination of pixel-wise and element-wise attention coefficients using high-semantic deep features of input samples. We evaluated the proposed method on the RAF-DB, a highly imbalanced dataset. The experimental results reveal significant improvements in comparison to the baseline for all three negative sample selection strategies.
翻译:面部表情传递着大量信息,在情感表达中起着关键作用。深度神经网络(DNN)结合深度度量学习(DML)技术,增强了模型在面部表情识别(FER)应用中的判别能力。仅配备交叉熵等分类损失函数的DNN,在压缩类内特征变化或分离类间特征距离方面,不如得到DML辅助损失项强化时的表现。三元组中心损失(TCL)函数作用于嵌入空间中样本嵌入的所有维度。在我们的工作中,我们开发了三种策略:全合成、半合成和基于预测的负样本选择策略。为了获得更好的结果,我们引入了一个选择性注意力模块,该模块利用输入样本的高语义深层特征,提供逐像素和逐元素注意力系数的组合。我们在高度不平衡的数据集RAF-DB上评估了所提出的方法。实验结果显示,与基线相比,所有三种负样本选择策略均有显著改进。