The recognition of human actions in videos is one of the most active research fields in computer vision. The canonical approach consists in a more or less complex preprocessing stages of the raw video data, followed by a relatively simple classification algorithm. Here we address recognition of human actions using the reservoir computing algorithm, which allows us to focus on the classifier stage. We introduce a new training method for the reservoir computer, based on "Timesteps Of Interest", which combines in a simple way short and long time scales. We study the performance of this algorithm using both numerical simulations and a photonic implementation based on a single non-linear node and a delay line on the well known KTH dataset. We solve the task with high accuracy and speed, to the point of allowing for processing multiple video streams in real time. The present work is thus an important step towards developing efficient dedicated hardware for video processing.
翻译:视频中的人体动作识别是计算机视觉领域最活跃的研究方向之一。标准方法通常包括对原始视频数据进行复杂度不一的预处理,随后采用相对简单的分类算法。本文采用储层计算算法进行人体动作识别,使我们能够专注于分类器阶段。我们提出了一种基于"感兴趣时间步长"的储层计算机新训练方法,该方法以简洁方式融合了短时间尺度和长时间尺度。我们通过数值模拟以及在基于单非线性节点和延迟线的光子系统上,使用经典KTH数据集对该算法的性能进行了研究。我们以高精度和高速度完成了该任务,甚至能够实时处理多路视频流。因此,本工作是向开发高效专用视频处理硬件迈出的重要一步。