Deep learning has played a significant role in the success of facial expression recognition (FER), thanks to large models and vast amounts of labelled data. However, obtaining labelled data requires a tremendous amount of human effort, time, and financial resources. Even though some prior works have focused on reducing the need for large amounts of labelled data using different unsupervised methods, another promising approach called active learning is barely explored in the context of FER. This approach involves selecting and labelling the most representative samples from an unlabelled set to make the best use of a limited 'labelling budget'. In this paper, we implement and study 8 recent active learning methods on three public FER datasets, FER13, RAF-DB, and KDEF. Our findings show that existing active learning methods do not perform well in the context of FER, likely suffering from a phenomenon called 'Cold Start', which occurs when the initial set of labelled samples is not well representative of the entire dataset. To address this issue, we propose contrastive self-supervised pre-training, which first learns the underlying representations based on the entire unlabelled dataset. We then follow this with the active learning methods and observe that our 2-step approach shows up to 9.2% improvement over random sampling and up to 6.7% improvement over the best existing active learning baseline without the pre-training. We will make the code for this study public upon publication at: github.com/ShuvenduRoy/ActiveFER.
翻译:深度学习凭借大型模型和海量标注数据在面部表情识别领域取得了显著成功。然而,获取标注数据需要耗费大量人力、时间和资金。尽管已有部分研究尝试通过不同无监督方法减少对大规模标注数据的依赖,但另一种名为"主动学习"的方法在面部表情识别中鲜有探索。该方法通过从未标注集合中筛选最具代表性的样本进行标注,以最优利用有限的"标注预算"。本文在FER13、RAF-DB和KDEF三个公开面部表情识别数据集上实现并研究了8种最新主动学习方法。研究发现,现有主动学习方法在面部表情识别场景中表现不佳,可能受困于"冷启动"现象——即初始标注样本集无法充分代表整个数据集。为解决该问题,我们提出对比自监督预训练策略,首先基于整个未标注数据集学习底层表征,随后结合主动学习方法。实验表明,我们的两步法相比随机采样最高提升9.2%,相比未使用预训练的最佳现有主动学习基线最高提升6.7%。本研究相关代码将在论文发表后于github.com/ShuvenduRoy/ActiveFER公开。