Factors Affecting the Performance of Automated Speaker Verification in Alzheimer's Disease Clinical Trials

Detecting duplicate patient participation in clinical trials is a major challenge because repeated patients can undermine the credibility and accuracy of the trial's findings and result in significant health and financial risks. Developing accurate automated speaker verification (ASV) models is crucial to verify the identity of enrolled individuals and remove duplicates, but the size and quality of data influence ASV performance. However, there has been limited investigation into the factors that can affect ASV capabilities in clinical environments. In this paper, we bridge the gap by conducting analysis of how participant demographic characteristics, audio quality criteria, and severity level of Alzheimer's disease (AD) impact the performance of ASV utilizing a dataset of speech recordings from 659 participants with varying levels of AD, obtained through multiple speech tasks. Our results indicate that ASV performance: 1) is slightly better on male speakers than on female speakers; 2) degrades for individuals who are above 70 years old; 3) is comparatively better for non-native English speakers than for native English speakers; 4) is negatively affected by clinician interference, noisy background, and unclear participant speech; 5) tends to decrease with an increase in the severity level of AD. Our study finds that voice biometrics raise fairness concerns as certain subgroups exhibit different ASV performances owing to their inherent voice characteristics. Moreover, the performance of ASV is influenced by the quality of speech recordings, which underscores the importance of improving the data collection settings in clinical trials.

翻译：在临床试验中检测重复患者参与是一个重大挑战，因为重复患者会破坏试验结果的可信度和准确性，并导致严重的健康和经济风险。开发准确的自动化说话人确认（ASV）模型对于验证已入组个体的身份和排除重复参与者至关重要，但数据的规模和质量会影响ASV的性能。然而，目前对临床环境中可能影响ASV能力的因素研究有限。本文通过分析参与者人口统计学特征、音频质量标准以及阿尔茨海默病（AD）严重程度对ASV性能的影响来弥补这一空白，所使用的数据集包含来自659名不同AD严重程度参与者的语音录音，通过多项语音任务获取。我们的结果表明：1）ASV在男性说话者上的性能略优于女性说话者；2）在70岁以上个体中性能下降；3）在非英语母语者中的性能相对优于英语母语者；4）临床医生干扰、背景噪音和参与者语音不清晰会对其产生负面影响；5）随着AD严重程度增加，性能趋于下降。我们的研究发现，语音生物特征引发了公平性问题，因为某些亚组由于其固有的语音特征而表现出不同的ASV性能。此外，ASV的性能受语音录音质量的影响，这强调了在临床试验中改进数据采集设置的重要性。