Complex Facial Expression Recognition Using Deep Knowledge Distillation of Basic Features

Complex emotion recognition is a cognitive task that has so far eluded the same excellent performance of other tasks that are at or above the level of human cognition. Emotion recognition through facial expressions is particularly difficult due to the complexity of emotions expressed by the human face. For a machine to approach the same level of performance in this domain as a human, it may need to synthesise knowledge and understand new concepts in real-time as humans do. Humans are able to learn new concepts using only few examples, by distilling the important information from memories and discarding the rest. Similarly, continual learning methods learn new classes whilst retaining the knowledge of known classes, whilst few-shot learning methods are able to learn new classes using very few training examples. We propose a novel continual learning method inspired by human cognition and learning that can accurately recognise new compound expression classes using few training samples, by building on and retaining its knowledge of basic expression classes. Using GradCAM visualisations, we demonstrate the relationship between basic and compound facial expressions, which our method leverages through knowledge distillation and a novel Predictive Sorting Memory Replay. Our method achieves the current state-of-the-art in continual learning for complex facial expression recognition with 74.28% Overall Accuracy on new classes. We also demonstrate that using continual learning for complex facial expression recognition achieves far better performance than non-continual learning methods, improving on state-of-the-art non-continual learning methods by 13.95%. To the best of our knowledge, our work is also the first to apply few-shot learning to complex facial expression recognition, achieving the state-of-the-art with 100% accuracy using a single training sample for each expression class.

翻译：复杂情绪识别是一项至今仍未达到与人类认知水平相当卓越性能的认知任务。由于人类面部所表达情绪的复杂性，通过面部表情进行情绪识别尤为困难。要使机器在该领域达到与人类相近的性能，它可能需要像人类一样实时综合知识并理解新概念。人类能够通过仅需少量样本即可学习新概念，其核心在于从记忆中提炼重要信息并摒弃其余内容。类似地，持续学习方法能够在保留已知类别知识的同时学习新类别，而少样本学习方法则能利用极少量训练样本学习新类别。我们提出了一种受人类认知与学习启发的新型持续学习方法，该方法通过构建并保留对基本表情类别的知识，能够仅用少量训练样本准确识别新的复合表情类别。利用GradCAM可视化，我们展示了基本面部表情与复合面部表情之间的关系，我们的方法通过知识蒸馏和一种新型预测排序记忆回放机制充分利用了这一关系。我们的方法在复杂面部表情识别的持续学习中达到了当前最优性能，新类别总体准确率为74.28%。我们还证明了将持续学习用于复杂面部表情识别比非持续学习方法具有更优越的性能，相较于现有最优非持续学习方法提升了13.95%。据我们所知，本研究还是首次将少样本学习应用于复杂面部表情识别，仅需每个表情类别一个训练样本即可实现100%准确率的最优性能。