Online recognition of gestures is critical for intuitive human-robot interaction (HRI) and further push collaborative robotics into the market, making robots accessible to more people. The problem is that it is difficult to achieve accurate gesture recognition in real unstructured environments, often using distorted and incomplete multisensory data. This paper introduces an HRI framework to classify large vocabularies of interwoven static gestures (SGs) and dynamic gestures (DGs) captured with wearable sensors. DG features are obtained by applying data dimensionality reduction to raw data from sensors (resampling with cubic interpolation and principal component analysis). Experimental tests were conducted using the UC2017 hand gesture dataset with samples from eight different subjects. The classification models show an accuracy of 95.6% for a library of 24 SGs with a random forest and 99.3% for 10 DGs using artificial neural networks. These results compare equally or favorably with different commonly used classifiers. Long short-term memory deep networks achieved similar performance in online frame-by-frame classification using raw incomplete data, performing better in terms of accuracy than static models with specially crafted features, but worse in training and inference time. The recognized gestures are used to teleoperate a robot in a collaborative process that consists in preparing a breakfast meal.
翻译:手势的在线识别对于直观的人机交互(HRI)至关重要,并进一步推动协作机器人进入市场,使机器人更易于被大众使用。问题是,在实际非结构化环境中,难以实现精确的手势识别,且常使用失真且不完整的多感官数据。本文引入了一种HRI框架,用于对可穿戴传感器捕获的大量交织静态手势(SG)和动态手势(DG)进行分类。通过将传感器原始数据进行数据降维(使用三次插值重采样和主成分分析)来获取DG特征。使用包含八名不同受试者样本的UC2017手势数据集进行了实验测试。分类模型对包含24种SG的库的随机森林准确率为95.6%,对10种DG的人工神经网络准确率为99.3%。这些结果与不同常用分类器相比相当或更优。使用原始不完整数据,长短期记忆深度网络在在线逐帧分类中实现了类似性能,在准确率方面优于具有精心设计特征的静态模型,但在训练和推理时间方面较差。识别出的手势用于在协作过程中远程操作机器人,该过程包括准备早餐。