In this paper, we study a novel problem in egocentric action recognition, which we term as "Multimodal Generalization" (MMG). MMG aims to study how systems can generalize when data from certain modalities is limited or even completely missing. We thoroughly investigate MMG in the context of standard supervised action recognition and the more challenging few-shot setting for learning new action categories. MMG consists of two novel scenarios, designed to support security, and efficiency considerations in real-world applications: (1) missing modality generalization where some modalities that were present during the train time are missing during the inference time, and (2) cross-modal zero-shot generalization, where the modalities present during the inference time and the training time are disjoint. To enable this investigation, we construct a new dataset MMG-Ego4D containing data points with video, audio, and inertial motion sensor (IMU) modalities. Our dataset is derived from Ego4D dataset, but processed and thoroughly re-annotated by human experts to facilitate research in the MMG problem. We evaluate a diverse array of models on MMG-Ego4D and propose new methods with improved generalization ability. In particular, we introduce a new fusion module with modality dropout training, contrastive-based alignment training, and a novel cross-modal prototypical loss for better few-shot performance. We hope this study will serve as a benchmark and guide future research in multimodal generalization problems. The benchmark and code will be available at https://github.com/facebookresearch/MMG_Ego4D.
翻译:本文研究自我中心动作识别中的新问题——"多模态泛化"(MMG)。MMG旨在探索当某些模态数据有限甚至完全缺失时,系统如何实现泛化。我们在标准监督动作识别及更具挑战性的新动作类别小样本学习场景中深入研究了MMG。MMG包含两种为现实应用中的安全性与效率考量设计的新颖场景:(1)缺失模态泛化,即训练时存在的某些模态在推理时缺失;(2)跨模态零样本泛化,即推理时与训练时存在的模态集合互不相交。为支撑此项研究,我们构建了包含视频、音频和惯性运动传感器(IMU)模态数据的新数据集MMG-Ego4D。该数据集源于Ego4D数据集,但经由人类专家处理并彻底重新标注,以促进MMG问题的研究。我们在MMG-Ego4D上评估了多种模型,并提出了具备更强泛化能力的新方法。具体而言,我们引入了带模态丢弃训练的新型融合模块、基于对比的对齐训练,以及为提升小样本性能设计的跨模态原型损失。希望本研究能成为多模态泛化问题的基准并指导未来研究。该基准与代码将发布于https://github.com/facebookresearch/MMG_Ego4D。