Unmanned Aerial Vehicles (UAVs) are increasingly deployed in close proximity to humans for applications such as parcel delivery, traffic monitoring, disaster response and infrastructure inspections. Ensuring safe and reliable operation in these human-populated environments demands accurate perception of human poses and actions from an aerial viewpoint. This perspective challenges existing methods with low resolution, steep viewing angles and (self-)occlusion, especially if the application demands realtime feasibile models. We train and deploy FlyPose, a lightweight top-down human pose estimation pipeline for aerial imagery. Through multi-dataset training, we achieve an average improvement of 6.8 mAP in person detection across the test-sets of Manipal-UAV, VisDrone, HIT-UAV as well as our custom dataset. For 2D human pose estimation we report an improvement of 16.3 mAP on the challenging UAV-Human dataset. FlyPose runs with an inference latency of ~20 milliseconds including preprocessing on a Jetson Orin AGX Developer Kit and is deployed onboard a quadrotor UAV during flight experiments. We also publish FlyPose-104, a small but challenging aerial human pose estimation dataset, that includes manual annotations from difficult aerial perspectives: https://github.com/farooqhassaan/FlyPose.
翻译:无人机正日益频繁地部署于人类近身环境中,应用于包裹投递、交通监控、灾害响应及基础设施巡检等场景。为确保在此类人机共存环境中的安全可靠运行,需要从航拍视角准确感知人体姿态与行为。该视角因图像分辨率低、视角陡峭及(自)遮挡等问题,对现有方法构成严峻挑战,尤其在应用要求模型具备实时可行性的情况下。本文训练并部署了FlyPose——一种面向航拍图像的轻量级自上而下人体姿态估计流程。通过多数据集训练,我们在Manipal-UAV、VisDrone、HIT-UAV及自定义数据集的测试集上,实现了人体检测平均精度提升6.8 mAP。针对二维人体姿态估计,我们在具有挑战性的UAV-Human数据集上取得了16.3 mAP的精度提升。FlyPose在Jetson Orin AGX开发套件上的推理延迟(含预处理)约为20毫秒,并已在四旋翼无人机飞行实验中完成机载部署。我们同时发布了FlyPose-104——一个规模较小但极具挑战性的航拍人体姿态估计数据集,包含来自困难航拍视角的人工标注:https://github.com/farooqhassaan/FlyPose。