Micro-gesture analysis attracts increasing attention for inferring spontaneous emotion from subtle body movements. Micro-gesture online recognition, which localizes and classifies each gesture instance in untrimmed videos, is a core task in the 4th EI-MiGA-IJCAI Challenge. Compared with typical temporal action detection, MGR emphasizes the localization and classification of actions, requiring the model to output the start time, end time, and category of each micro-gesture. Moreover, since micro-gestures are highly spontaneous, relying solely on a single modality makes it difficult to capture the complete and accurate multi-modal cues. In this work, we propose DyFADet+, which extends DyFADet into a dual-stream RGB-skeleton framework. In our model, both modalities are projected into shared multi-scale temporal embeddings and fused through a gated residual module, which adaptively injects skeleton motion into the RGB representation rather than using naive concatenation. Finally, these fused features are decoded by a Dynamic TAD head for online classification and boundary regression. On the SMG dataset, our method achieves an F1 score of 40.88, ranking 2nd in the Micro-gesture Online Recognition track.
翻译:微手势分析通过细微的身体运动推断自发情绪,正吸引越来越多的关注。微手势在线识别作为第4届EI-MiGA-IJCAI挑战赛的核心任务,需在未修剪视频中定位并分类每个手势实例。与典型时序动作检测相比,MGR更强调动作的定位与分类,要求模型输出每个微手势的起始时间、结束时间和类别。此外,由于微手势具有高度自发性,仅依赖单一模态难以捕捉完整准确的跨模态线索。本文提出DyFADet+,将DyFADet扩展为双流RGB-骨架框架。在该模型中,两种模态被投影到共享的多尺度时序嵌入中,并通过门控残差模块进行融合,该模块自适应地将骨架运动注入RGB表示,而非简单拼接。最后,这些融合特征由Dynamic TAD头解码以实现在线分类和边界回归。在SMG数据集上,本方法取得40.88的F1分数,在微手势在线识别赛道排名第二。