Sign language is a visual language used by the deaf and dumb community to communicate. However, for most recognition methods based on monocular cameras, the recognition accuracy is low and the robustness is poor. Even if the effect is good on some data, it may perform poorly in other data with different interference due to the inability to extract effective features. To solve these problems, we propose a sign language recognition network that integrates skeleton features of hands and facial expression. Among this, we propose a hand skeleton feature extraction based on coordinate transformation to describe the shape of the hand more accurately. Moreover, by incorporating facial expression information, the accuracy and robustness of sign language recognition are finally improved, which was verified on A Dataset for Argentinian Sign Language and SEU's Chinese Sign Language Recognition Database (SEUCSLRD).
翻译:手语是聋哑群体用于交流的视觉语言。然而,对于大多数基于单目摄像头的识别方法而言,其识别准确率较低且鲁棒性较差。即使在某些数据上效果良好,由于无法提取有效特征,在存在不同干扰的其他数据上可能表现不佳。为解决这些问题,本文提出一种融合手部骨骼特征与面部表情信息的手语识别网络。其中,我们提出一种基于坐标变换的手部骨骼特征提取方法,以更精确地描述手部形状。此外,通过融入面部表情信息,最终提升了手语识别的准确性与鲁棒性,该效果在阿根廷手语数据集及东南大学中国手语识别数据库(SEUCSLRD)上得到了验证。