We present a novel approach for action recognition in UAV videos. Our formulation is designed to handle occlusion and viewpoint changes caused by the movement of a UAV. We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain. This enables our recognition model to learn from the key features associated with the motion. We also propose a novel frame sampling method that uses joint mutual information to acquire the most informative frame sequence in UAV videos. We have integrated our approach with X3D and evaluated the performance on multiple datasets. In practice, we achieve 18.9% improvement in Top-1 accuracy over current state-of-the-art methods on UAV-Human(Li et al., 2021), 7.3% improvement on Drone-Action(Perera et al., 2019), and 7.16% improvement on NEC Drones(Choi et al., 2020).
翻译:我们提出了一种针对无人机视频中动作识别的新方法。该方法专门设计用于处理由无人机运动引起的遮挡和视角变化。我们利用互信息概念,在时域中计算并对齐与人类动作或运动相对应的区域,从而使识别模型能够学习与运动相关的关键特征。同时,我们提出了一种新颖的帧采样方法,该方法通过联合互信息获取无人机视频中信息量最大的帧序列。我们将该方法与X3D模型集成,并在多个数据集上评估了性能。实验结果表明,在UAV-Human(Li等,2021)数据集上,我们的Top-1准确率相较于当前最先进方法提升了18.9%;在Drone-Action(Perera等,2019)数据集上提升了7.3%;在NEC Drones(Choi等,2020)数据集上提升了7.16%。