Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. A common key step for behavioral analysis is first extracting relevant keypoints on animals, known as pose estimation. However, reliable inference of poses currently requires domain knowledge and manual labeling effort to build supervised models. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop and deploy deep learning models that require zero additional human labels and model training. SuperAnimal allows video inference on over 45 species with only two global classes of animal pose models. If the models need fine-tuning, we show SuperAnimal models are 10$\times$ more data efficient and outperform prior transfer-learning-based approaches. Moreover, we provide an unsupervised video-adaptation method to refine keypoints in videos. We illustrate the utility of our model in behavioral classification in mice and gait analysis in horses. Collectively, this presents a data-efficient solution for animal pose estimation for downstream behavioral analysis.
翻译:行为量化在神经科学、兽医学及动物保护等领域的关键应用中至关重要。行为分析的常见关键步骤是首先提取动物身上的相关关键点,即姿态估计。然而,当前可靠的姿态推断需要领域知识和人工标注来构建监督模型。本文提出一系列技术创新,形成名为SuperAnimal的新方法,用于开发并部署无需额外人工标注和模型训练的深度学习模型。SuperAnimal仅通过两类全局动物姿态模型即可实现超过45个物种的视频推理。若需微调,我们证明SuperAnimal模型的数据效率提高10倍,且优于既往基于迁移学习的方法。此外,我们提出一种无监督视频自适应方法用于优化视频中的关键点。我们展示了该模型在小鼠行为分类及马匹步态分析中的实用性。总体而言,这为下游行为分析提供了一种数据高效的动物姿态估计解决方案。