3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking

Markerless methods for animal posture tracking have been rapidly developing recently, but frameworks and benchmarks for tracking large animal groups in 3D are still lacking. To overcome this gap in the literature, we present 3D-MuPPET, a framework to estimate and track 3D poses of up to 10 pigeons at interactive speed using multiple camera views. We train a pose estimator to infer 2D keypoints and bounding boxes of multiple pigeons, then triangulate the keypoints to 3D. For identity matching of individuals in all views, we first dynamically match 2D detections to global identities in the first frame, then use a 2D tracker to maintain IDs across views in subsequent frames. We achieve comparable accuracy to a state of the art 3D pose estimator in terms of median error and Percentage of Correct Keypoints. Additionally, we benchmark the inference speed of 3D-MuPPET, with up to 9.45 fps in 2D and 1.89 fps in 3D, and perform quantitative tracking evaluation, which yields encouraging results. Finally, we showcase two novel applications for 3D-MuPPET. First, we train a model with data of single pigeons and achieve comparable results in 2D and 3D posture estimation for up to 5 pigeons. Second, we show that 3D-MuPPET also works in outdoors without additional annotations from natural environments. Both use cases simplify the domain shift to new species and environments, largely reducing annotation effort needed for 3D posture tracking. To the best of our knowledge we are the first to present a framework for 2D/3D animal posture and trajectory tracking that works in both indoor and outdoor environments for up to 10 individuals. We hope that the framework can open up new opportunities in studying animal collective behaviour and encourages further developments in 3D multi-animal posture tracking.

翻译：无标记动物姿态追踪方法近年来发展迅速，但追踪大型动物群体三维姿态的框架与基准仍显不足。为弥补这一研究空白，我们提出3D-MuPPET框架，该框架利用多视角摄像头以交互速度对多达10只鸽子的三维姿态进行估计与追踪。首先训练姿态估计器推断多只鸽子的二维关键点与边界框，随后将关键点三角测量至三维空间。为实现所有视角中个体的身份匹配，我们首先动态地将首帧的二维检测结果与全局身份进行关联，继而利用二维追踪器在后续帧中跨视角维持身份标识。该方法在中值误差与关键点正确率方面达到了与最先进三维姿态估计器相当的精度。此外，我们对3D-MuPPET的推理速度进行基准测试（二维模式下达9.45帧/秒，三维模式下达1.89帧/秒），并开展定量追踪评估，取得了令人鼓舞的结果。最后，我们展示了3D-MuPPET的两项创新应用：其一，利用单只鸽子的数据训练模型，在多达5只鸽子的二维/三维姿态估计中取得可比结果；其二，证明3D-MuPPET在无需额外自然场景标注的情况下，亦适用于户外环境。这两类应用简化了向新物种与新环境的领域迁移，大幅减少了三维姿态追踪所需的标注工作量。据我们所知，这是首个在室内外场景中实现多达10个个体二维/三维姿态与轨迹追踪的统一框架。期望该框架能为动物群体行为研究开辟新途径，并推动三维多动物姿态追踪领域的进一步发展。