Automated video analysis is critical for wildlife conservation. A foundational task in this domain is multi-animal tracking (MAT), which underpins applications such as individual re-identification and behavior recognition. However, existing datasets are limited in scale, constrained to a few species, or lack sufficient temporal and geographical diversity - leaving no suitable benchmark for training general-purpose MAT models applicable across wild animal populations. To address this, we introduce SA-FARI, the largest open-source MAT dataset for wild animals. It comprises 11,609 camera trap videos collected over approximately 10 years (2014-2024) from 741 locations across 4 continents, spanning 99 species categories. Each video is exhaustively annotated culminating in ~46 hours of densely annotated footage containing 16,224 masklet identities and 942,702 individual bounding boxes, segmentation masks, and species labels. Alongside the task-specific annotations, we publish anonymized camera trap locations for each video. Finally, we present comprehensive benchmarks on SA-FARI using state-of-the-art vision-language models for detection and tracking, including SAM 3, evaluated with both species-specific and generic animal prompts. We also compare against vision-only methods developed specifically for wildlife analysis. SA-FARI is the first large-scale dataset to combine high species diversity, multi-region coverage, and high-quality spatio-temporal annotations, offering a new foundation for advancing generalizable multianimal tracking in the wild. The dataset is available at $\href{https://www.conservationxlabs.com/sa-fari}{\text{conservationxlabs.com/SA-FARI}}$.
翻译:自动化视频分析对野生动物保护至关重要。该领域的一项基础任务是多动物跟踪(MAT),它支撑着个体重识别和行为识别等应用。然而,现有数据集在规模上有限,局限于少数物种,或缺乏足够的时间和地理多样性——没有适用于训练可跨野生动物种群应用的通用MAT模型的合适基准。为解决这一问题,我们引入了SA-FARI,这是最大的野生动物开源MAT数据集。它包含11,609个相机陷阱视频,采集时间跨度约10年(2014-2024年),来自4大洲的741个地点,涵盖99个物种类别。每个视频均经过详尽标注,最终形成约46小时的密集标注影像,包含16,224个掩码标识和942,702个个体边界框、分割掩码和物种标签。除了任务特定标注外,我们还发布了每个视频的匿名相机陷阱位置信息。最后,我们利用最先进的视觉语言模型(包括SAM 3)在SA-FARI上进行了全面的检测与跟踪基准测试,评估时使用了物种特定和通用动物提示。我们还与专为野生动物分析开发的纯视觉方法进行了比较。SA-FARI是首个结合高物种多样性、多区域覆盖和高质量时空标注的大规模数据集,为推进野外可泛化多动物跟踪提供了新的基础。数据集可通过$\href{https://www.conservationxlabs.com/sa-fari}{\text{conservationxlabs.com/SA-FARI}}$获取。