This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model's three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe's single-task baseline, M\&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.
翻译:本文提出M&M模型,一种应用于AVCAffe数据集进行认知负荷评估的新型多模态多任务学习框架。M&M通过双通路架构独特地整合了视听线索,为音频和视频输入设计了专用流。其关键创新在于跨模态多头注意力机制,该机制融合不同模态以实现同步多任务处理。另一个显著特征是该模型的三个专用分支,每个分支针对特定的认知负荷标签进行定制,支持细粒度的任务特异性分析。虽然与AVCAffe的单任务基线相比表现中等,但M&M展现了集成多模态处理的可行框架。本研究为多模态多任务学习系统的未来改进铺平道路,强调融合多样化数据类型以处理复杂任务。