In this study, we propose an automated framework for camel farm monitoring, introducing two key contributions: the Unified Auto-Annotation framework and the Fine-Tune Distillation framework. The Unified Auto-Annotation approach combines two models, GroundingDINO (GD), and Segment-Anything-Model (SAM), to automatically annotate raw datasets extracted from surveillance videos. Building upon this foundation, the Fine-Tune Distillation framework conducts fine-tuning of student models using the auto-annotated dataset. This process involves transferring knowledge from a large teacher model to a student model, resembling a variant of Knowledge Distillation. The Fine-Tune Distillation framework aims to be adaptable to specific use cases, enabling the transfer of knowledge from the large models to the small models, making it suitable for domain-specific applications. By leveraging our raw dataset collected from Al-Marmoom Camel Farm in Dubai, UAE, and a pre-trained teacher model, GroundingDINO, the Fine-Tune Distillation framework produces a lightweight deployable model, YOLOv8. This framework demonstrates high performance and computational efficiency, facilitating efficient real-time object detection. Our code is available at \href{https://github.com/Razaimam45/Fine-Tune-Distillation}{https://github.com/Razaimam45/Fine-Tune-Distillation}
翻译:本研究提出了一种用于骆驼农场监控的自动化框架,并贡献了两项关键技术:统一自动标注框架(Unified Auto-Annotation Framework)与精调蒸馏框架(Fine-Tune Distillation Framework)。其中,统一自动标注方法整合了两个模型——GroundingDINO(GD)与Segment-Anything-Model(SAM),实现对监控视频原始数据集的自动标注。在此基础上,精调蒸馏框架利用自动标注数据集对学生模型进行微调。该过程类似知识蒸馏的变体,通过将大型教师模型的知识迁移至学生模型实现。精调蒸馏框架旨在适应特定应用场景,使知识从大模型传递至小模型,适用于领域专用任务。我们利用从阿联酋迪拜阿尔玛蒙骆驼农场采集的原始数据集,结合预训练教师模型GroundingDINO,该框架成功生成了轻量化可部署模型YOLOv8。实验表明,该框架兼具高性能与计算效率,支持高效的实时目标检测。相关代码已开源至 \href{https://github.com/Razaimam45/Fine-Tune-Distillation}{https://github.com/Razaimam45/Fine-Tune-Distillation}。