Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. A common key step for behavioral analysis is first extracting relevant keypoints on animals, known as pose estimation. However, reliable inference of poses currently requires domain knowledge and manual labeling effort to build supervised models. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop and deploy deep learning models that require zero additional human labels and model training. SuperAnimal allows video inference on over 45 species with only two global classes of animal pose models. If the models need fine-tuning, we show SuperAnimal models are 10$\times$ more data efficient and outperform prior transfer learning approaches. Moreover, we provide a new video-adaptation method to perform unsupervised refinement of videos, and we illustrate the utility of our model in behavioral classification. Collectively, this presents a data-efficient, plug-and-play solution for behavioral analysis.
翻译:行为量化在神经科学、兽医医学及动物保护等领域的应用中至关重要。行为分析的一个常见关键步骤是首先在动物身上提取相关关键点,即姿态估计。然而,可靠的姿态推断目前需要领域知识和人工标注工作来构建监督模型。我们提出了一系列技术创新,形成了一种名为SuperAnimal的新方法,用于开发和部署无需额外人工标注及模型训练的深度学习模型。SuperAnimal仅凭两类全局动物姿态模型即可对超过45个物种进行视频推理。若模型需要微调,我们证明SuperAnimal模型的数据效率提高了10倍,并优于先前的迁移学习方法。此外,我们提供了一种新的视频自适应方法,用于对视频进行无监督优化,并展示了该模型在行为分类中的实用性。综合而言,这为行为分析提供了一种数据高效且即插即用的解决方案。