As intelligent systems become increasingly important in our daily lives, new ways of interaction are needed. Classical user interfaces pose issues for the physically impaired and are partially not practical or convenient. Gesture recognition is an alternative, but often not reactive enough when conventional cameras are used. This work proposes a Spiking Convolutional Neural Network, processing event- and depth data for gesture recognition. The network is simulated using the open-source neuromorphic computing framework LAVA for offline training and evaluation on an embedded system. For the evaluation three open source data sets are used. Since these do not represent the applied bi-modality, a new data set with synchronized event- and depth data was recorded. The results show the viability of temporal encoding on depth information and modality fusion, even on differently encoded data, to be beneficial to network performance and generalization capabilities.
翻译:随着智能系统在日常生活中日益重要,需要开发新的交互方式。传统用户界面给身体障碍者带来不便,并且在某些场景下不实用或不便捷。手势识别是一种替代方案,但使用传统摄像头时往往反应不够灵敏。本研究提出一种脉冲卷积神经网络,通过处理事件数据与深度数据实现手势识别。该网络利用开源神经形态计算框架LAVA进行仿真,以便在嵌入式系统上进行离线训练与评估。评估过程中使用了三个开源数据集。由于这些数据集未涵盖所应用的双模态特性,本研究还录制了一个包含同步事件与深度数据的新数据集。结果表明,对深度信息进行时间编码以及模态融合(即使针对不同编码的数据)均有利于提升网络性能与泛化能力。