Unmanned Aerial Vehicles (UAVs) are increasingly deployed in close proximity to humans for applications such as parcel delivery, traffic monitoring, disaster response and infrastructure inspections. Ensuring safe and reliable operation in these human-populated environments demands accurate perception of human poses and actions from an aerial viewpoint. This perspective challenges existing methods with low resolution, steep viewing angles and (self-)occlusion, especially if the application demands realtime feasibile models. We train and deploy FlyPose, a lightweight top-down human pose estimation pipeline for aerial imagery. Through multi-dataset training, we achieve an average improvement of 6.8 mAP in person detection across the test-sets of Manipal-UAV, VisDrone, HIT-UAV as well as our custom dataset. For 2D human pose estimation we report an improvement of 16.3 mAP on the challenging UAV-Human dataset. FlyPose runs with an inference latency of ~20 milliseconds including preprocessing on a Jetson Orin AGX Developer Kit and is deployed onboard a quadrotor UAV during flight experiments. We also publish FlyPose-104, a small but challenging aerial human pose estimation dataset, that includes manual annotations from difficult aerial perspectives: https://github.com/farooqhassaan/FlyPose.
翻译:无人机正日益频繁地部署在靠近人类的环境中,应用于包裹递送、交通监控、灾害响应和基础设施检查等场景。为确保在这些有人环境中安全可靠地运行,需要从空中视角准确感知人体姿态与动作。这一视角因图像分辨率低、视角陡峭及(自)遮挡等问题,对现有方法构成挑战,尤其在应用要求模型具备实时可行性的情况下。我们训练并部署了FlyPose——一个面向航拍图像的轻量级自上而下人体姿态估计流程。通过多数据集训练,我们在Manipal-UAV、VisDrone、HIT-UAV及我们自定义数据集的测试集上,实现了人物检测平均精度提升6.8 mAP。在具有挑战性的UAV-Human数据集上,我们报告的二维人体姿态估计精度提升了16.3 mAP。FlyPose在Jetson Orin AGX开发套件上的推理延迟(包括预处理)约为20毫秒,并在飞行实验中部署于四旋翼无人机机载系统。我们还发布了FlyPose-104——一个规模较小但极具挑战性的航拍人体姿态估计数据集,包含来自困难空中视角的人工标注:https://github.com/farooqhassaan/FlyPose。