Conventional automatic word-naming recognition systems struggle to recognize words from post-stroke patients with aphasia because of disfluencies and mispronunciations, limiting reliable automated assessment in this population. In this paper, we propose a Contrastive Language-Audio Pretraining (CLAP) based approach for automatic word-naming recognition to address this challenge by leveraging text-audio alignment. Our approach treats word-naming recognition as an audio-text matching problem, projecting speech signals and textual prompts into a shared embedding space to identify intended words even in challenging recordings. Evaluated on two speech datasets of French post-stroke patients with aphasia, our approach achieves up to 90% accuracy, outperforming existing classification-based and automatic speech recognition-based baselines.
翻译:传统的自动单词命名识别系统由于脑卒中后失语症患者存在言语不流畅和发音错误的问题,难以准确识别其单词,限制了在该人群中实现可靠的自动化评估。本文提出一种基于对比语言-音频预训练(CLAP)的自动单词命名识别方法,通过利用文本-音频对齐技术应对这一挑战。该方法将单词命名识别视为音频-文本匹配问题,将语音信号和文本提示映射到共享嵌入空间中,从而即使在具有挑战性的录音中也能识别目标单词。在法语脑卒中后失语症患者的两个语音数据集上进行评估,本方法准确率最高可达90%,优于现有的基于分类和基于自动语音识别的基线方法。