Dementia is a progressive neurological disorder that profoundly affects the daily lives of older adults, impairing abilities such as verbal communication and cognitive function. Early diagnosis is essential for enhancing both lifespan and quality of life for affected individuals. Despite its importance, diagnosing dementia is complex and often necessitates a multimodal approach incorporating diverse clinical data types. In this study, we fine-tune Wav2vec and Word2vec baseline models using two distinct data types: audio recordings and text transcripts. We experiment with four conditions: original datasets versus datasets purged of short sentences, each with and without data augmentation. Our results indicate that synonym-based text data augmentation generally enhances model performance, underscoring the importance of data volume for achieving generalizable performance. Additionally, models trained on text data frequently excel and can further improve the performance of other modalities when combined. Audio and timestamp data sometimes offer marginal improvements. We provide a qualitative error analysis of the sentence archetypes that tend to be misclassified under each condition, providing insights into the effects of altering data modality and augmentation decisions.
翻译:痴呆是一种进行性神经退行性疾病,深刻影响老年人的日常生活,损害言语交流和认知功能等能力。早期诊断对于提高患者寿命和生活质量至关重要。尽管诊断痴呆具有重要意义,但其过程复杂,往往需要结合多种临床数据类型的多模态方法。在本研究中,我们使用两种不同数据类型(音频记录和文本转录)对Wav2vec和Word2vec基线模型进行微调。实验涉及四种条件:原始数据集与去除短句后的数据集,每种条件分别进行或不进行数据增强。结果表明,基于同义词的文本数据增强通常能提升模型性能,凸显了数据量对实现泛化性能的重要性。此外,基于文本数据训练的模型常表现优异,且与其他模态结合时能进一步提升后者的性能。音频和时间戳数据有时仅带来边际改进。我们对每种条件下易被误分类的句子原型进行定性错误分析,从而揭示改变数据模态和增强策略的影响。