Depression is a severe global mental health issue that impairs daily functioning and overall quality of life. Although recent audio-visual approaches have improved automatic depression detection, methods that ignore emotional cues often fail to capture subtle depressive signals hidden within emotional expressions. Conversely, those incorporating emotions frequently confuse transient emotional expressions with stable depressive symptoms in feature representations, a phenomenon termed \emph{Emotional Ambiguity}, thereby leading to detection errors. To address this critical issue, we propose READ-Net, the first audio-visual depression detection framework explicitly designed to resolve Emotional Ambiguity through Adaptive Feature Recalibration (AFR). The core insight of AFR is to dynamically adjust the weights of emotional features to enhance depression-related signals. Rather than merely overlooking or naively combining emotional information, READ-Net innovatively identifies and preserves depressive-relevant cues within emotional features, while adaptively filtering out irrelevant emotional noise. This recalibration strategy significantly clarifies feature representations, and effectively mitigates the persistent challenge of emotional interference. Additionally, READ-Net can be easily integrated into existing frameworks for improved performance. Extensive evaluations on three publicly available datasets show that READ-Net outperforms state-of-the-art methods, with average gains of 4.55\% in accuracy and 1.26\% in F1-score, demonstrating its robustness to emotional disturbances and improving audio-visual depression detection.
翻译:抑郁是一种严重的全球性心理健康问题,会损害日常功能并降低整体生活质量。尽管近期的视听方法已改进了自动抑郁检测,但忽略情绪线索的方法往往无法捕捉隐藏在情绪表达中的细微抑郁信号。相反,那些融入情绪的方法在特征表示中经常将短暂的情绪表达与稳定的抑郁症状相混淆,这种现象被称为**情绪模糊性**,从而导致检测错误。为解决这一关键问题,我们提出了READ-Net,这是首个通过自适应特征重校准(AFR)明确设计用于解决情绪模糊性的视听抑郁检测框架。AFR的核心思想是动态调整情绪特征的权重以增强抑郁相关信号。READ-Net并非简单地忽略或朴素地融合情绪信息,而是创新性地识别并保留情绪特征中与抑郁相关的线索,同时自适应地滤除无关的情绪噪声。这种重校准策略显著澄清了特征表示,并有效缓解了情绪干扰这一长期存在的挑战。此外,READ-Net可以轻松集成到现有框架中以提升性能。在三个公开数据集上的广泛评估表明,READ-Net优于现有最先进方法,在准确率和F1分数上分别平均提升了4.55%和1.26%,证明了其对情绪干扰的鲁棒性,并提升了视听抑郁检测的性能。