Analyzing animal and human behavior has long been a challenging task in computer vision. Early approaches from the 1970s to the 1990s relied on hand-crafted edge detection, segmentation, and low-level features such as color, shape, and texture to locate objects and infer their identities-an inherently ill-posed problem. Behavior analysis in this era typically proceeded by tracking identified objects over time and modeling their trajectories using sparse feature points, which further limited robustness and generalization. A major shift occurred with the introduction of ImageNet by Deng and Li in 2010, which enabled large-scale visual recognition through deep neural networks and effectively served as a comprehensive visual dictionary. This development allowed object recognition to move beyond complex low-level processing toward learned high-level representations. In this work, we follow this paradigm to build a large-scale Universal Action Space (UAS) using existing labeled human-action datasets. We then use this UAS as the foundation for analyzing and categorizing mammalian and chimpanzee behavior datasets. The source code is released on GitHub at https://github.com/franktpmvu/Universal-Action-Space.
翻译:分析动物和人类行为长期以来一直是计算机视觉领域的一项挑战性任务。从20世纪70年代到90年代的早期方法依赖于手工设计的边缘检测、分割以及颜色、形状和纹理等低级特征来定位物体并推断其身份——这本质上是一个不适定问题。该时期的行为分析通常通过对已识别物体进行时序跟踪,并使用稀疏特征点对其轨迹建模,这进一步限制了方法的鲁棒性和泛化能力。随着Deng和Li于2010年提出ImageNet数据集,领域发生了重大转变,该数据集通过深度神经网络实现了大规模视觉识别,并有效充当了全面的视觉词典。这一进展使得物体识别得以超越复杂的低级处理,转向基于学习的高级表征。在本研究中,我们遵循这一范式,利用现有带标注的人类动作数据集构建了大规模的统一动作空间(UAS)。随后,我们将该UAS作为分析哺乳动物和黑猩猩行为数据集并进行分类的基础。源代码已在GitHub发布:https://github.com/franktpmvu/Universal-Action-Space。