The evidence is growing that machine and deep learning methods can learn the subtle differences between the language produced by people with various forms of cognitive impairment such as dementia and cognitively healthy individuals. Valuable public data repositories such as TalkBank have made it possible for researchers in the computational community to join forces and learn from each other to make significant advances in this area. However, due to variability in approaches and data selection strategies used by various researchers, results obtained by different groups have been difficult to compare directly. In this paper, we present TRESTLE (\textbf{T}oolkit for \textbf{R}eproducible \textbf{E}xecution of \textbf{S}peech \textbf{T}ext and \textbf{L}anguage \textbf{E}xperiments), an open source platform that focuses on two datasets from the TalkBank repository with dementia detection as an illustrative domain. Successfully deployed in the hackallenge (Hackathon/Challenge) of the International Workshop on Health Intelligence at AAAI 2022, TRESTLE provides a precise digital blueprint of the data pre-processing and selection strategies that can be reused via TRESTLE by other researchers seeking comparable results with their peers and current state-of-the-art (SOTA) approaches.
翻译:越来越多的证据表明,机器学习和深度学习方法能够学习认知障碍(如痴呆症)患者与认知健康个体所产生语言之间的细微差异。诸如TalkBank等宝贵的公共数据存储库,使计算领域的研究人员能够联合力量、相互学习,在该领域取得重大进展。然而,由于不同研究者采用的方法和数据选择策略存在差异,不同团队获得的结果难以直接比较。本文介绍TRESTLE(语音、文本与语言实验可复现执行工具包),这是一个开源平台,以TalkBank存储库中的两个数据集为重点,以痴呆症检测作为示范领域。该工具已在AAAI 2022国际健康智能研讨会黑客马拉松(Hackathon/Challenge)中成功部署,通过TRESTLE提供数据预处理和选择策略的精确数字蓝图,其他研究者可复用该工具以获取与同行及当前最优方法具有可比性的结果。