Exploring the Boundaries of Semi-Supervised Facial Expression Recognition: Learning from In-Distribution, Out-of-Distribution, and Unconstrained Data

Deep learning-based methods have been the key driving force behind much of the recent success of facial expression recognition (FER) systems. However, the need for large amounts of labelled data remains a challenge. Semi-supervised learning offers a way to overcome this limitation, allowing models to learn from a small amount of labelled data along with a large unlabelled dataset. While semi-supervised learning has shown promise in FER, most current methods from general computer vision literature have not been explored in the context of FER. In this work, we present a comprehensive study on 11 of the most recent semi-supervised methods, in the context of FER, namely Pi-model, Pseudo-label, Mean Teacher, VAT, UDA, MixMatch, ReMixMatch, FlexMatch, CoMatch, and CCSSL. Our investigation covers semi-supervised learning from in-distribution, out-of-distribution, unconstrained, and very small unlabelled data. Our evaluation includes five FER datasets plus one large face dataset for unconstrained learning. Our results demonstrate that FixMatch consistently achieves better performance on in-distribution unlabelled data, while ReMixMatch stands out among all methods for out-of-distribution, unconstrained, and scarce unlabelled data scenarios. Another significant observation is that semi-supervised learning produces a reasonable improvement over supervised learning, regardless of whether in-distribution, out-of-distribution, or unconstrained data is utilized as the unlabelled set. We also conduct sensitivity analyses on critical hyper-parameters for the two best methods of each setting.

翻译：基于深度学习的方法一直是面部表情识别（FER）系统近来取得众多成功的关键驱动力。然而，对大量标注数据的需求仍是一项挑战。半监督学习提供了一种克服这一局限的途径，使模型能够从少量标注数据和大量无标注数据中学习。尽管半监督学习在FER中展现出潜力，但目前大多数来自通用计算机视觉文献的方法尚未在FER背景下得到探索。本研究对近年来11种半监督方法（即Pi-model、Pseudo-label、Mean Teacher、VAT、UDA、MixMatch、ReMixMatch、FlexMatch、CoMatch和CCSSL）在FER背景下的表现进行了全面研究。我们的研究涵盖来自同分布、异分布、无约束以及极小规模无标注数据的半监督学习。评估涉及五个FER数据集及一个用于无约束学习的大型人脸数据集。结果表明，FixMatch在处理同分布无标注数据时始终表现更优，而ReMixMatch在所有方法中于异分布、无约束及稀缺无标注数据场景下表现突出。另一重要发现是，无论采用同分布、异分布还是无约束数据作为无标注集，半监督学习均能带来相较于监督学习的显著改进。此外，我们还针对每种设定中两种最优方法的关键超参数进行了敏感性分析。