This paper presents our Facial Action Units (AUs) recognition submission to the fifth Affective Behavior Analysis in-the-wild Competition (ABAW). Our approach consists of three main modules: (i) a pre-trained facial representation encoder which produce a strong facial representation from each input face image in the input sequence; (ii) an AU-specific feature generator that specifically learns a set of AU features from each facial representation; and (iii) a spatio-temporal graph learning module that constructs a spatio-temporal graph representation. This graph representation describes AUs contained in all frames and predicts the occurrence of each AU based on both the modeled spatial information within the corresponding face and the learned temporal dynamics among frames. The experimental results show that our approach outperformed the baseline and the spatio-temporal graph representation learning allows our model to generate the best results among all ablated systems. Our model ranks at the 4th place in the AU recognition track at the 5th ABAW Competition.
翻译:本文介绍了我们在第五届野外情感行为分析竞赛(ABAW)中面部动作单元(AUs)识别的提交方案。我们的方法包含三个主要模块:(i)预训练的面部表示编码器,能够从输入序列中的每张输入人脸图像生成强健的面部表示;(ii)AU特异性特征生成器,专门从每个面部表示中学习一组AU特征;(iii)时空图学习模块,构建一个时空图表示。该图表示描述了所有帧中包含的AUs,并基于相应面部内建模的空间信息以及帧间学习到的时间动态预测每个AU的发生。实验结果表明,我们的方法优于基线模型,并且时空图表示学习使我们的模型在所有消融系统中产生最佳结果。我们的模型在第五届ABAW竞赛的AU识别赛道中排名第四。