Semantic segmentation is a complex task that relies heavily on large amounts of annotated image data. However, annotating such data can be time-consuming and resource-intensive, especially in the medical domain. Active Learning (AL) is a popular approach that can help to reduce this burden by iteratively selecting images for annotation to improve the model performance. In the case of video data, it is important to consider the model uncertainty and the temporal nature of the sequences when selecting images for annotation. This work proposes a novel AL strategy for surgery video segmentation, \COALSamp{}, COrrelation-aWare Active Learning. Our approach involves projecting images into a latent space that has been fine-tuned using contrastive learning and then selecting a fixed number of representative images from local clusters of video frames. We demonstrate the effectiveness of this approach on two video datasets of surgical instruments and three real-world video datasets. The datasets and code will be made publicly available upon receiving necessary approvals.
翻译:语义分割是一项复杂任务,严重依赖大量带标注的图像数据。然而,在医学领域,标注此类数据往往耗时且资源密集。主动学习是一种常用方法,通过迭代选择需要标注的图像来提升模型性能,从而减轻这一负担。针对视频数据,在选择标注图像时需综合考虑模型的不确定性和序列的时间特性。本文提出了一种新颖的主动学习策略——COALSamp(相关性感知主动学习),用于手术视频分割。我们的方法通过对比学习微调后的潜在空间投影图像,并从局部视频帧聚类中选择固定数量的代表性图像。我们在两个手术器械视频数据集及三个真实世界视频数据集上验证了该方法的有效性。数据集和代码将在获得必要批准后公开提供。