Current research workflows for precise video segmentation are often forced into a compromise between labor-intensive manual curation, costly commercial platforms, and/or privacy-compromising cloud-based services. The demand for high-fidelity video instance segmentation in research is often hindered by the bottleneck of manual annotation and the privacy concerns of cloud-based tools. We present SAMannot, an open-source, local framework that integrates the Segment Anything Model 2 (SAM2) into a human-in-the-loop workflow. To address the high resource requirements of foundation models, we modified the SAM2 dependency and implemented a processing layer that minimizes computational overhead and maximizes throughput, ensuring a highly responsive user interface. Key features include persistent instance identity management, an automated ``lock-and-refine'' workflow with barrier frames, and a mask-skeletonization-based auto-prompting mechanism. SAMannot facilitates the generation of research-ready datasets in YOLO and PNG formats alongside structured interaction logs. Verified through animal behavior tracking use-cases and subsets of the LVOS and DAVIS benchmark datasets, the tool provides a scalable, private, and cost-effective alternative to commercial platforms for complex video annotation tasks.
翻译:当前精确视频分割的研究工作流程往往被迫在劳动密集型人工标注、昂贵的商业平台和/或侵犯隐私的云服务之间做出妥协。研究中对高保真视频实例分割的需求常受限于人工标注瓶颈以及基于云的工具带来的隐私担忧。我们提出了SAMannot,一个开源、本地的框架,它将Segment Anything Model 2 (SAM2)集成到人机协同的工作流程中。为应对基础模型的高资源需求,我们修改了SAM2的依赖项,并实现了一个处理层,以最小化计算开销并最大化吞吐量,从而确保高度响应的用户界面。关键功能包括持久的实例身份管理、基于屏障帧的自动化“锁定-精修”工作流程,以及基于掩码骨架化的自动提示机制。SAMannot支持生成YOLO和PNG格式的研究就绪数据集以及结构化的交互日志。通过在动物行为追踪用例以及LVOS和DAVIS基准数据集的子集上进行验证,该工具为复杂视频标注任务提供了一个可扩展、私密且经济高效的商业平台替代方案。