Human movement studies and analyses have been fundamental in many scientific domains, ranging from neuroscience to education, pattern recognition to robotics, health care to sports, and beyond. Previous speech motor models were proposed to understand how speech movement is produced and how the resulting speech varies when some parameters are changed. However, the inverse approach, in which the muscular response parameters and the subject's age are derived from real continuous speech, is not possible with such models. Instead, in the handwriting field, the kinematic theory of rapid human movements and its associated Sigma-lognormal model have been applied successfully to obtain the muscular response parameters. This work presents a speech kinematics based model that can be used to study, analyze, and reconstruct complex speech kinematics in a simplified manner. A method based on the kinematic theory of rapid human movements and its associated Sigma lognormal model are applied to describe and to parameterize the asymptotic impulse response of the neuromuscular networks involved in speech as a response to a neuromotor command. The method used to carry out transformations from formants to a movement observation is also presented. Experiments carried out with the (English) VTR TIMIT database and the (German) Saarbrucken Voice Database, including people of different ages, with and without laryngeal pathologies, corroborate the link between the extracted parameters and aging, on the one hand, and the proportion between the first and second formants required in applying the kinematic theory of rapid human movements, on the other. The results should drive innovative developments in the modeling and understanding of speech kinematics.
翻译:人类运动的研究与分析一直是许多科学领域的基础,涵盖神经科学、教育学、模式识别、机器人学、医疗保健、体育等多个范畴。以往提出的言语运动模型旨在理解言语运动如何产生,以及当某些参数改变时产生的语音如何变化。然而,这些模型无法采用逆向方法从真实连续语音中推导出肌肉响应参数和受试者年龄。相反,在手写领域中,快速人类运动运动学理论及其相关的Sigma-lognormal模型已成功应用于获取肌肉响应参数。本文提出一种基于言语运动学的模型,能够以简化方式研究、分析和重建复杂的言语运动学。我们应用快速人类运动运动学理论及其相关的Sigma-lognormal模型,来描述和参数化言语中神经肌肉网络对神经运动指令的渐近脉冲响应。同时介绍了从共振峰到运动观测的转换方法。使用(英语)VTR TIMIT数据库和(德语)Saarbrucken语音数据库进行的实验(涵盖不同年龄、有无喉部病变的受试者)证实了:一方面提取参数与衰老之间存在关联;另一方面,应用快速人类运动运动学理论所需的第一和第二共振峰比例与之相关。研究结果将推动言语运动学建模与理解的创新性发展。