Semantic segmentation is a complex task that relies heavily on large amounts of annotated image data. However, annotating such data can be time-consuming and resource-intensive, especially in the medical domain. Active Learning (AL) is a popular approach that can help to reduce this burden by iteratively selecting images for annotation to improve the model performance. In the case of video data, it is important to consider the model uncertainty and the temporal nature of the sequences when selecting images for annotation. This work proposes a novel AL strategy for surgery video segmentation, COWAL, COrrelation-aWare Active Learning. Our approach involves projecting images into a latent space that has been fine-tuned using contrastive learning and then selecting a fixed number of representative images from local clusters of video frames. We demonstrate the effectiveness of this approach on two video datasets of surgical instruments and three real-world video datasets. The datasets and code will be made publicly available upon receiving necessary approvals.
翻译:语义分割是一项高度依赖大量标注图像数据的复杂任务。然而,在医学领域,标注此类数据往往耗时且资源密集。主动学习(AL)是一种常用方法,可通过迭代选择图像进行标注来提升模型性能,从而减轻这一负担。针对视频数据,在选择标注图像时需综合考虑模型不确定性与序列的时间特性。本文提出一种面向手术视频分割的新型主动学习策略——COWAL(即相关性感知主动学习)。该方法将图像投影到经对比学习微调的潜在空间中,随后从视频帧的局部聚类中选取固定数量的代表图像。我们使用两个手术器械视频数据集和三个真实场景视频数据集验证了该方法的有效性。数据集与代码将在获得必要批准后公开提供。