Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection

Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-level labelling and introduce a hierarchical interpretation approach to provide both speech-level and sentence-level interpretations, based on gradient-weighted attention maps derived from all attention layers to track interactions between input features. We show that the proposed model outperforms a model that learns at a segment level ($p$=0.854, $r$=0.947, $F1$=0.897 compared to $p$=0.732, $r$=0.808, $F1$=0.768). For model interpretation, using one true positive sample, we show which sentences within a given speech are most relevant to depression detection; and which text tokens and Mel-spectrogram regions within these sentences are most relevant to depression detection. These interpretations allow clinicians to verify the validity of predictions made by depression detection tools, promoting their clinical implementations.

翻译：抑郁是一种常见的精神障碍。基于机器学习的语音自动抑郁检测工具有助于抑郁症的早期筛查。本文针对此类工具在临床应用中可能存在的两个局限性——片段级标注导致的噪声以及模型可解释性不足——提出解决方案。我们构建了双模态语音级Transformer以避免片段级标注，并引入层级解释方法，通过从所有注意力层提取的梯度加权注意力图谱追踪输入特征间的交互，实现语音级与语句级双重解释。实验表明，所提模型优于片段级学习模型（模型性能对比：$p$=0.854, $r$=0.947, $F1$=0.897 vs. $p$=0.732, $r$=0.808, $F1$=0.768）。在模型解释方面，通过单例真阳性样本验证，我们展示了给定语音中与抑郁检测最相关的语句，以及这些语句内与抑郁检测最相关的文本标记和梅尔频谱图区域。这类解释机制使临床医生能够验证抑郁检测工具预测结果的可靠性，从而推动其临床转化应用。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日