Facial action units (AUs) play an indispensable role in human emotion analysis. We observe that although AU-based high-level emotion analysis is urgently needed by real-world applications, frame-level AU results provided by previous works cannot be directly used for such analysis. Moreover, as AUs are dynamic processes, the utilization of global temporal information is important but has been gravely ignored in the literature. To this end, we propose EventFormer for AU event detection, which is the first work directly detecting AU events from a video sequence by viewing AU event detection as a multiple class-specific sets prediction problem. Extensive experiments conducted on a commonly used AU benchmark dataset, BP4D, show the superiority of EventFormer under suitable metrics.
翻译:面部动作单元(AUs)在人类情感分析中扮演着不可或缺的角色。我们观察到,尽管现实应用迫切需要基于AU的高层级情感分析,但以往方法提供的帧级AU结果无法直接用于此类分析。此外,由于AU是动态过程,全局时间信息的利用虽至关重要,却在现有文献中严重被忽视。为此,我们提出EventFormer用于AU事件检测,这是首个通过将AU事件检测视为多个类别特定集合预测问题,从视频序列中直接检测AU事件的工作。在常用AU基准数据集BP4D上进行的大量实验表明,EventFormer在适宜指标下具有优越性。