Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition establishes two tracks: i.e., the Multi-task Learning (MTL) Challenge and the Compound Expression (CE) challenge based on Aff-Wild2 and C-EXPR-DB datasets. In this paper, we present our methods and experimental results for the two competition tracks. Specifically, it can be summarized in the following four aspects: 1) To attain high-quality facial features, we train a Masked-Auto Encoder in a self-supervised manner. 2) We devise a temporal convergence module to capture the temporal information between video frames and explore the impact of window size and sequence length on each sub-task. 3) To facilitate the joint optimization of various sub-tasks, we explore the impact of sub-task joint training and feature fusion from individual tasks on each task performance improvement. 4) We utilize curriculum learning to transition the model from recognizing single expressions to recognizing compound expressions, thereby improving the accuracy of compound expression recognition. Extensive experiments demonstrate the superiority of our designs.
翻译:情感行为分析旨在开发能够识别并响应人类情感的情绪智能技术。为推进该领域发展,第七届野外情感行为分析(ABAW)竞赛设立了两个赛道:基于Aff-Wild2和C-EXPR-DB数据集的多任务学习挑战赛与复合表情挑战赛。本文针对这两个竞赛赛道提出了我们的方法并展示了实验结果。具体而言,本工作可归纳为以下四个方面:1)为获取高质量面部特征,我们以自监督方式训练了掩码自编码器。2)我们设计了时序收敛模块以捕捉视频帧间的时间信息,并探究了窗口大小和序列长度对各子任务的影响。3)为实现多子任务的联合优化,我们探索了子任务联合训练与单任务特征融合对各任务性能提升的影响。4)我们采用课程学习策略,使模型从识别单一表情逐步过渡到识别复合表情,从而提升复合表情识别的准确率。大量实验证明了我们设计方案的有效性。