Ultrasound based hand movement estimation is a crucial area of research with applications in human-machine interaction. Forearm ultrasound offers detailed information about muscle morphology changes during hand movement which can be used to estimate hand gestures. Previous work has focused on analyzing 2-Dimensional (2D) ultrasound image frames using techniques such as convolutional neural networks (CNNs). However, such 2D techniques do not capture temporal features from segments of ultrasound data corresponding to continuous hand movements. This study uses 3D CNN based techniques to capture spatio-temporal patterns within ultrasound video segments for gesture recognition. We compared the performance of a 2D convolution-based network with (2+1)D convolution-based, 3D convolution-based, and our proposed network. Our methodology enhanced the gesture classification accuracy to 98.8 +/- 0.9%, from 96.5 +/- 2.3% compared to a network trained with 2D convolution layers. These results demonstrate the advantages of using ultrasound video snippets for improving hand gesture classification performance.
翻译:基于超声的手部运动估计是人机交互应用中的一个关键研究领域。前臂超声能够提供手部运动过程中肌肉形态变化的详细信息,可用于估计手势。先前的研究主要集中于使用卷积神经网络(CNN)等技术分析二维(2D)超声图像帧。然而,此类二维技术无法从对应于连续手部运动的超声数据片段中捕获时序特征。本研究采用基于3D CNN的技术来捕获超声视频片段内的时空模式以进行手势识别。我们比较了基于2D卷积的网络、基于(2+1)D卷积的网络、基于3D卷积的网络以及我们提出的网络的性能。与使用2D卷积层训练的网络相比,我们的方法将手势分类准确率从96.5 +/- 2.3%提升至98.8 +/- 0.9%。这些结果证明了使用超声视频片段对于提升手势分类性能的优势。