Tool tracking in surgical videos is vital in computer-assisted intervention for tasks like surgeon skill assessment, safety zone estimation, and human-machine collaboration during minimally invasive procedures. The lack of large-scale datasets hampers Artificial Intelligence implementation in this domain. Current datasets exhibit overly generic tracking formalization, often lacking surgical context: a deficiency that becomes evident when tools move out of the camera's scope, resulting in rigid trajectories that hinder realistic surgical representation. This paper addresses the need for a more precise and adaptable tracking formalization tailored to the intricacies of endoscopic procedures by introducing CholecTrack20, an extensive dataset meticulously annotated for multi-class multi-tool tracking across three perspectives representing the various ways of considering the temporal duration of a tool trajectory: (1) intraoperative, (2) intracorporeal, and (3) visibility within the camera's scope. The dataset comprises 20 laparoscopic videos with over 35,000 frames and 65,000 annotated tool instances with details on spatial location, category, identity, operator, phase, and surgical visual conditions. This detailed dataset caters to the evolving assistive requirements within a procedure.
翻译:手术视频中的工具追踪在计算机辅助干预中至关重要,可应用于微创手术中的外科医生技能评估、安全区域估计以及人机协作等任务。大规模数据集的缺乏阻碍了人工智能在该领域的发展。现有数据集存在过于通用的追踪形式化问题,往往缺乏手术背景——当工具移出摄像机视野时,这一缺陷尤为明显,导致工具轨迹僵硬,难以真实反映手术过程。本文针对内窥镜手术的复杂性,提出了一种更精确、更灵活的追踪形式化方法,并由此引入CholecTrack20:一个大规模数据集,该数据集从三个视角(分别代表工具轨迹时间维度的不同考量方式:术中、体内、以及摄像机视野可见性)对多类别多工具追踪进行了精细标注。该数据集包含20个腹腔镜视频,涵盖超过35,000帧图像和65,000个标注的工具实例,并提供了空间位置、类别、身份标识、操作者、手术阶段以及手术视觉条件等详细信息。该详细数据集满足了手术过程中不断演变的辅助需求。