IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels, Consonants, Words, and Phrases

Several sensing techniques have been proposed for silent speech recognition (SSR); however, many of these methods require invasive processes or sensor attachment to the skin using adhesive tape or glue, rendering them unsuitable for frequent use in daily life. By contrast, impulse radio ultra-wideband (IR-UWB) radar can operate without physical contact with users' articulators and related body parts, offering several advantages for SSR. These advantages include high range resolution, high penetrability, low power consumption, robustness to external light or sound interference, and the ability to be embedded in space-constrained handheld devices. This study demonstrated IR-UWB radar-based contactless SSR using four types of speech stimuli (vowels, consonants, words, and phrases). To achieve this, a novel speech feature extraction algorithm specifically designed for IR-UWB radar-based SSR is proposed. Each speech stimulus is recognized by applying a classification algorithm to the extracted speech features. Two different algorithms, multidimensional dynamic time warping (MD-DTW) and deep neural network-hidden Markov model (DNN-HMM), were compared for the classification task. Additionally, a favorable radar antenna position, either in front of the user's lips or below the user's chin, was determined to achieve higher recognition accuracy. Experimental results demonstrated the efficacy of the proposed speech feature extraction algorithm combined with DNN-HMM for classifying vowels, consonants, words, and phrases. Notably, this study represents the first demonstration of phoneme-level SSR using contactless radar.

翻译：针对静默语音识别（SSR）已有多种传感技术被提出，但多数方法需使用胶带或胶水进行侵入式操作或传感器贴附于皮肤，难以适用于日常生活中的频繁使用。相比之下，脉冲无线电超宽带（IR-UWB）雷达可在不与用户发音器官及相关身体部位物理接触的情况下运行，为SSR提供了多项优势：高距离分辨率、强穿透性、低功耗、对外部光线或声音干扰的稳健性，以及可嵌入空间受限的手持设备。本研究利用四种语音刺激类型（元音、辅音、词汇和短语），展示了基于IR-UWB雷达的非接触式SSR。为此，提出了一种专门针对IR-UWB雷达SSR的新型语音特征提取算法。通过将分类算法应用于提取的语音特征，对每种语音刺激进行识别。比较了多维动态时间规整（MD-DTW）和深度神经网络-隐马尔可夫模型（DNN-HMM）两种不同算法在分类任务中的表现。此外，确定了位于用户嘴唇前方或下巴下方的雷达天线最佳位置，以实现更高识别精度。实验结果表明，所提出的语音特征提取算法结合DNN-HMM在元音、辅音、词汇和短语分类中效果显著。值得注意的是，本研究首次实现了基于非接触雷达的音素级静默语音识别。