A methodological framework and exemplar protocol for the collection and analysis of repeated speech samples

Nicholas Cummins,Lauren L. White,Zahia Rahman,Catriona Lucas,Tian Pan,Ewan Carr,Faith Matcham,Johnny Downs,Richard J. Dobson,Thomas F. Quatieri,Judith Dineley

from arxiv, Main manuscript: 37 pages. 3 figures, 8 tables, 1 textbox. Submitted to JMIR Research Methods. Replacement with format changes and copyediting

Speech and language biomarkers have the potential to be regular, objective assessments of symptom severity in several health conditions, both in-clinic and remotely using mobile devices. However, the complex nature of speech and often subtle changes associated with health mean that findings are highly dependent on methodological and cohort choices. These are often not reported adequately in studies investigating speech-based health assessment, hindering the progress of methodological speech research. Our objectives were to) facilitate replicable speech research by presenting an adaptable speech collection and analytical method and design checklist for other researchers to adapt for their own experiments and develop an exemplar protocol that reduces and controls for confounding factors in repeated recordings of speech, including device choice, speech elicitation task and non-pathological variability. The presented protocol comprises the elicitation of read speech, held vowels and a picture description collected with a freestanding condenser microphone, 3 smartphones and a headset. We extracted a set of 14 exemplar speech features. We collected healthy speech from 28 individuals 3 times in 1 day, repeated at the same times 8-11 weeks later, and from 25 individuals on 3 days in 1 week at fixed times. Participant characteristics collected included sex, age, native language status and voice use habits. Before each recording, we collected information on recent voice use, food and drink intake, and emotional state. The extracted features are presented providing a resource of normative values. Speech data collection, processing, analysis and reporting towards clinical research and practice varies widely. Greater harmonisation of study protocols and consistent reporting are urgently required to translate speech processing into clinical research and practice.

翻译：语音和语言生物标志物有潜力成为多种健康状况症状严重程度的常规客观评估工具，既可在诊所内使用，也可通过移动设备远程实施。然而，语音的复杂性及其与健康相关的往往微妙的变化意味着研究结果高度依赖于方法学和队列选择。在基于语音的健康评估研究中，这些细节常常未能得到充分报告，从而阻碍了方法学语音研究的进展。我们的目标是：1）通过提出一个可适应的语音采集与分析方法及设计清单，供其他研究人员根据自身实验进行调整，从而促进可复现的语音研究；2）开发一个示例协议，以减少和控制重复语音录音中的混杂因素，包括设备选择、语音诱发任务以及非病理性的变异。所提出的协议包括使用独立式电容麦克风、3部智能手机和一副头戴式耳机采集朗读语音、持续元音和图片描述任务。我们提取了一组14个示例语音特征。我们采集了28名健康个体在1天内3个时间点的语音，并在8-11周后的相同时间点重复采集；另外采集了25名个体在1周内3天固定时间点的语音。收集的参与者特征包括性别、年龄、母语状况和用声习惯。在每次录音前，我们收集了近期用声情况、饮食摄入和情绪状态信息。提取的特征数据提供了规范性参考值。面向临床研究和实践的语音数据采集、处理、分析和报告方式差异巨大。为了将语音处理技术转化为临床研究和实践，迫切需要加强研究协议的协调性和报告的一致性。