Due to the substantial number of clinicians, patients, and data collection environments involved in clinical trials, gathering data of superior quality poses a significant challenge. In clinical trials, patients are assessed based on their speech data to detect and monitor cognitive and mental health disorders. We propose using these speech recordings to verify the identities of enrolled patients and identify and exclude the individuals who try to enroll multiple times in the same trial. Since clinical studies are often conducted across different countries, creating a system that can perform speaker verification in diverse languages without additional development effort is imperative. We evaluate pre-trained TitaNet, ECAPA-TDNN, and SpeakerNet models by enrolling and testing with speech-impaired patients speaking English, German, Danish, Spanish, and Arabic languages. Our results demonstrate that tested models can effectively generalize to clinical speakers, with less than 2.7% EER for European Languages and 8.26% EER for Arabic. This represents a significant step in developing more versatile and efficient speaker verification systems for cognitive and mental health clinical trials that can be used across a wide range of languages and dialects, substantially reducing the effort required to develop speaker verification systems for multiple languages. We also evaluate how speech tasks and number of speakers involved in the trial influence the performance and show that the type of speech tasks impacts the model performance.
翻译:由于临床试验涉及大量临床医生、患者及数据采集环境,获取高质量数据面临重大挑战。在临床试验中,患者通过语音数据评估以检测和监测认知及精神健康障碍。我们提出利用这些语音记录来验证入组患者身份,识别并排除试图重复参与同一试验的个体。由于临床研究常跨国开展,开发无需额外开发即可支持多语言的说话人验证系统至关重要。我们通过使用英语、德语、丹麦语、西班牙语和阿拉伯语的语言障碍患者语音数据进行注册和测试,评估了预训练的TitaNet、ECAPA-TDNN及SpeakerNet模型。实验结果表明,测试模型可有效泛化至临床患者群体:欧洲语言等错误率低于2.7%,阿拉伯语为8.26%。这标志着在开发适用于多语言方言的认知及精神健康临床试验通用高效说话人验证系统方面取得重要进展,可显著降低多语言说话人验证系统的开发工作量。我们还分析了语音任务类型及试验参与人数对性能的影响,证实语音任务类型确实会影响模型表现。