Depression is a prevalent mental health disorder that severely impairs daily functioning and quality of life. While recent deep learning approaches for depression detection have shown promise, most rely on limited feature types, overlook explicit cross-modal interactions, and employ simple concatenation or static weighting for fusion. To overcome these limitations, we propose CAF-Mamba, a novel Mamba-based cross-modal adaptive attention fusion framework. CAF-Mamba not only captures cross-modal interactions explicitly and implicitly, but also dynamically adjusts modality contributions through a modality-wise attention mechanism, enabling more effective multimodal fusion. Experiments on two in-the-wild benchmark datasets, LMVD and D-Vlog, demonstrate that CAF-Mamba consistently outperforms existing methods and achieves state-of-the-art performance.
翻译:抑郁症是一种普遍的精神健康障碍,严重损害日常功能和生活质量。尽管近期用于抑郁症检测的深度学习方法显示出潜力,但大多数方法依赖于有限的模态特征类型,忽视了显式的跨模态交互,并采用简单的拼接或静态加权进行融合。为克服这些局限,我们提出了CAF-Mamba,一种新颖的基于Mamba的跨模态自适应注意力融合框架。CAF-Mamba不仅显式和隐式地捕获跨模态交互,还通过模态级注意力机制动态调整各模态的贡献,从而实现更有效的多模态融合。在两个真实场景基准数据集LMVD和D-Vlog上的实验表明,CAF-Mamba持续优于现有方法,并取得了最先进的性能。