This paper describes the use of connectionist techniques in phonetic speech recognition with strong latency constraints. The constraints are imposed by the task of deriving the lip movements of a synthetic face in real time from the speech signal, by feeding the phonetic string into an articulatory synthesiser. Particular attention has been paid to analysing the interaction between the time evolution model learnt by the multi-layer perceptrons and the transition model imposed by the Viterbi decoder, in different latency conditions. Two experiments were conducted in which the time dependencies in the language model (LM) were controlled by a parameter. The results show a strong interaction between the three factors involved, namely the neural network topology, the length of time dependencies in the LM and the decoder latency.
翻译:本文描述了在强延迟约束条件下,将连接主义技术应用于音素语音识别。这些约束源于通过将音素字符串输入到语音合成器中,从语音信号实时推导出合成人脸唇部运动的任务。本文特别关注了在不同延迟条件下,分析多层感知器学习的时间演化模型与维特比解码器强加的转移模型之间的相互作用。通过一个参数控制语言模型中的时间依赖性,进行了两项实验。结果表明,所涉及的三个因素(即神经网络拓扑结构、语言模型中时间依赖性的长度以及解码器延迟)之间存在强烈的相互作用。