Gestures form an important medium of communication between humans and machines. An overwhelming majority of existing gesture recognition methods are tailored to a scenario where humans and machines are located very close to each other. This short-distance assumption does not hold true for several types of interactions, for example gesture-based interactions with a floor cleaning robot or with a drone. Methods made for short-distance recognition are unable to perform well on long-distance recognition due to gestures occupying only a small portion of the input data. Their performance is especially worse in resource constrained settings where they are not able to effectively focus their limited compute on the gesturing subject. We propose a novel, accurate and efficient method for the recognition of gestures from longer distances. It uses a dynamic neural network to select features from gesture-containing spatial regions of the input sensor data for further processing. This helps the network focus on features important for gesture recognition while discarding background features early on, thus making it more compute efficient compared to other techniques. We demonstrate the performance of our method on the LD-ConGR long-distance dataset where it outperforms previous state-of-the-art methods on recognition accuracy and compute efficiency.
翻译:手势是人类与机器之间重要的交流媒介。绝大多数现有手势识别方法专为人类与机器相距很近的场景设计。这种短距离假设不适用于多种交互场景,例如与地板清洁机器人或无人机进行手势交互。由于手势仅占据输入数据的一小部分,短距离识别方法在长距离识别中表现不佳。在资源受限环境中,这些方法尤其无法有效将有限计算资源聚焦于做手势的对象。我们提出一种新颖、准确且高效的长距离手势识别方法。该方法利用动态神经网络从输入传感器数据中包含手势的空间区域中选择特征进行进一步处理,从而帮助网络聚焦于手势识别关键特征,同时早期剔除背景特征,相较其他技术显著提升计算效率。我们在LD-ConGR长距离数据集上验证了该方法,其在识别精度和计算效率上均超越了此前最先进的方法。