We present CataractSAM-2, a domain-adapted extension of Meta's Segment Anything Model 2, designed for real-time semantic segmentation of cataract ophthalmic surgery videos with high accuracy. Positioned at the intersection of computer vision and medical robotics, CataractSAM-2 enables precise intraoperative perception crucial for robotic-assisted and computer-guided surgical systems. Furthermore, to alleviate the burden of manual labeling, we introduce an interactive annotation framework that combines sparse prompts with video-based mask propagation. This tool significantly reduces annotation time and facilitates the scalable creation of high-quality ground-truth masks, accelerating dataset development for ocular anterior segment surgeries. We also demonstrate the model's strong zero-shot generalization to glaucoma trabeculectomy procedures, confirming its cross-procedural utility and potential for broader surgical applications. The trained model and annotation toolkit are released as open-source resources, establishing CataractSAM-2 as a foundation for expanding anterior ophthalmic surgical datasets and advancing real-time AI-driven solutions in medical robotics, as well as surgical video understanding.
翻译:我们提出CataractSAM-2——Meta的Segment Anything Model 2的领域自适应扩展版本,专为白内障眼科手术视频的高精度实时语义分割而设计。该模型位于计算机视觉与医疗机器人的交叉领域,能够实现机器人辅助及计算机引导手术系统所必需的精确术中感知。为了减轻人工标注负担,我们引入了一种交互式标注框架,该框架结合了稀疏提示与基于视频的掩码传播。该工具显著减少了标注时间,并促进了高质量真值掩码的可扩展生成,从而加速了眼前节手术数据集的开发。我们还展示了该模型在青光眼小梁切除术中的强零样本泛化能力,证实了其跨术式实用性及更广泛外科应用的潜力。训练好的模型及标注工具包已作为开源资源发布,使CataractSAM-2成为扩展眼前段眼科手术数据集、推进医疗机器人中实时AI驱动解决方案以及手术视频理解的基石。