Markerless methods for animal posture tracking have been developing recently, but frameworks and benchmarks for tracking large animal groups in 3D are still lacking. To overcome this gap in the literature, we present 3D-MuPPET, a framework to estimate and track 3D poses of up to 10 pigeons at interactive speed using multiple-views. We train a pose estimator to infer 2D keypoints and bounding boxes of multiple pigeons, then triangulate the keypoints to 3D. For correspondence matching, we first dynamically match 2D detections to global identities in the first frame, then use a 2D tracker to maintain correspondences accross views in subsequent frames. We achieve comparable accuracy to a state of the art 3D pose estimator for Root Mean Square Error (RMSE) and Percentage of Correct Keypoints (PCK). We also showcase a novel use case where our model trained with data of single pigeons provides comparable results on data containing multiple pigeons. This can simplify the domain shift to new species because annotating single animal data is less labour intensive than multi-animal data. Additionally, we benchmark the inference speed of 3D-MuPPET, with up to 10 fps in 2D and 1.5 fps in 3D, and perform quantitative tracking evaluation, which yields encouraging results. Finally, we show that 3D-MuPPET also works in natural environments without model fine-tuning on additional annotations. To the best of our knowledge we are the first to present a framework for 2D/3D posture and trajectory tracking that works in both indoor and outdoor environments.
翻译:无标记动物姿态追踪方法近年来发展迅速,但面向大规模动物群体三维追踪的框架与基准测试仍存在空白。为填补这一文献缺口,我们提出3D-MuPPET框架,该框架利用多视角技术以交互式速度实现最多10只鸽子的三维姿态估计与追踪。我们首先训练姿态估计器推断多只鸽子的二维关键点与边界框,随后通过三角测量将关键点转换至三维空间。在对应匹配方面,我们首先在初始帧中动态匹配二维检测结果与全局身份标识,继而利用二维追踪器维持后续帧中跨视角对应关系。在均方根误差(RMSE)与正确关键点百分比(PCK)指标上,本方法达到了与当前最优三维姿态估计器相当的精度。此外,我们展示了一项创新应用:使用单鸽数据训练的模型在包含多鸽的数据集上仍能获得可比结果。这一特性可简化新物种迁移中的域偏移问题——因为单动物数据标注相较于多动物数据标注更省人力。我们还对3D-MuPPET的推理速度进行了基准测试:二维处理可达10帧/秒,三维处理达1.5帧/秒,定量追踪评估亦取得令人鼓舞的结果。最后,实验表明3D-MuPPET在自然环境中无需额外标注微调即可正常工作。据我们所知,这是首个可在室内外环境同时实现二维/三维姿态与轨迹追踪的框架。