Depression is a prevalent mental health disorder that severely impairs daily functioning and quality of life. While recent deep learning approaches for depression detection have shown promise, most rely on limited feature types, overlook explicit cross-modal interactions, and employ simple concatenation or static weighting for fusion. To overcome these limitations, we propose CAF-Mamba, a novel Mamba-based cross-modal adaptive attention fusion framework. CAF-Mamba not only captures cross-modal interactions explicitly and implicitly, but also dynamically adjusts modality contributions through a modality-wise attention mechanism, enabling more effective multimodal fusion. Experiments on two in-the-wild benchmark datasets, LMVD and D-Vlog, demonstrate that CAF-Mamba consistently outperforms existing methods and achieves state-of-the-art performance. Our code is available at https://github.com/zbw-zhou/CAF-Mamba.
翻译:抑郁症是一种普遍存在的心理健康障碍,严重损害日常功能和生活质量。尽管近期基于深度学习的抑郁症检测方法展现出潜力,但大多数方法依赖有限的特征类型,忽视了显式的跨模态交互,并采用简单的拼接或静态加权进行融合。为克服这些局限,我们提出了CAF-Mamba,一种新颖的基于Mamba的跨模态自适应注意力融合框架。CAF-Mamba不仅显式和隐式地捕捉跨模态交互,还通过模态级注意力机制动态调整各模态的贡献,从而实现更有效的多模态融合。在两个真实场景基准数据集LMVD和D-Vlog上的实验表明,CAF-Mamba持续优于现有方法,并取得了最先进的性能。我们的代码可在 https://github.com/zbw-zhou/CAF-Mamba 获取。