Alzheimer's dementia (AD) affects memory, thinking, and language, deteriorating person's life. An early diagnosis is very important as it enables the person to receive medical help and ensure quality of life. Therefore, leveraging spontaneous speech in conjunction with machine learning methods for recognizing AD patients has emerged into a hot topic. Most of the previous works employ Convolutional Neural Networks (CNNs), to process the input signal. However, finding a CNN architecture is a time-consuming process and requires domain expertise. Moreover, the researchers introduce early and late fusion approaches for fusing different modalities or concatenate the representations of the different modalities during training, thus the inter-modal interactions are not captured. To tackle these limitations, first we exploit a Neural Architecture Search (NAS) method to automatically find a high performing CNN architecture. Next, we exploit several fusion methods, including Multimodal Factorized Bilinear Pooling and Tucker Decomposition, to combine both speech and text modalities. To the best of our knowledge, there is no prior work exploiting a NAS approach and these fusion methods in the task of dementia detection from spontaneous speech. We perform extensive experiments on the ADReSS Challenge dataset and show the effectiveness of our approach over state-of-the-art methods.
翻译:阿尔茨海默病(AD)会影响记忆、思维和语言能力,降低患者生活质量。早期诊断至关重要,可使患者及时获得医疗帮助并保障生活质量。因此,利用自发性语音结合机器学习方法识别AD患者已成为研究热点。以往研究大多采用卷积神经网络(CNN)处理输入信号,但寻找合适的CNN架构既耗时又需要领域专业知识。此外,现有研究通常采用早期融合或晚期融合方法处理不同模态,或在训练过程中直接拼接不同模态的表征,导致无法捕捉模态间的交互关系。为解决这些局限,我们首先利用神经架构搜索(NAS)方法自动发现高性能CNN架构,随后采用包括多模态因子化双线性池化和Tucker分解在内的多种融合方法,将语音和文本模态进行结合。据我们所知,目前尚无研究将NAS方法及上述融合技术应用于自发性语音的痴呆检测任务。我们在ADReSS挑战赛数据集上进行了大量实验,结果表明我们的方法相较于现有最优方法具有显著优势。