Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. We use the Computer Vision Annotation Tool (CVAT) to collect our annotations. To make the procedure more efficient, we perform an initial detection and tracking of cattle in the videos using appropriate pre-trained models. The results are corrected by domain experts along with cattle behavior labeling in CVAT. The pre-hoc detection and tracking step significantly reduces the manual annotation time and effort. Moreover, we convert CVB to the atomic visual action (AVA) format and train and evaluate the popular SlowFast action recognition model on it. The associated preliminary results confirm that we can localize the cattle and recognize their frequently occurring behaviors with confidence. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data.
翻译:现有的用于牛行为识别的图像/视频数据集大多规模较小、缺乏清晰标注,或是在不切实际的人工控制环境中采集的。这限制了基于这些数据集训练的机器学习(ML)模型的实用性。因此,我们引入了一个名为“牛视觉行为”(Cattle Visual Behaviors,CVB)的新数据集,该数据集包含502个视频片段,每个片段时长15秒,在自然光照条件下采集,并标注了放牧牛群的11种视觉可感知行为。我们使用计算机视觉标注工具(CVAT)收集标注数据。为提高效率,我们利用合适的预训练模型对视频中的牛进行初始检测和跟踪,然后由领域专家在CVAT中修正结果并标注牛的行为。这种预测检测与跟踪步骤显著减少了人工标注的时间和精力。此外,我们将CVB转换为原子视觉动作(AVA)格式,并在此数据集上训练和评估了流行的SlowFast动作识别模型。相关初步结果证实,我们能够可靠地定位牛只并识别其频繁出现的行为。通过创建和共享CVB,我们的目标是开发能准确识别所有重要行为的改进模型,并帮助其他研究人员和实践者利用视频数据开发和评估用于牛行为分类的新型机器学习模型。