Following the successful debut of polyp detection and characterization, more advanced automation tools are being developed for colonoscopy. The new automation tasks, such as quality metrics or report generation, require understanding of the procedure flow that includes activities, events, anatomical landmarks, etc. In this work we present a method for automatic semantic parsing of colonoscopy videos. The method uses a novel DL multi-label temporal segmentation model trained in supervised and unsupervised regimes. We evaluate the accuracy of the method on a test set of over 300 annotated colonoscopy videos, and use ablation to explore the relative importance of various method's components.
翻译:在息肉检测与特征表征成功应用后,更先进的自动化工具正在被开发用于结肠镜检查。新的自动化任务,如质量指标评估或报告生成,需要理解包含活动、事件、解剖标志等要素的检查流程。本研究提出一种自动解析结肠镜视频语义的方法。该方法采用新型深度学习多标签时间分割模型,通过监督式与非监督式训练相结合的方式进行训练。我们在包含300余例标注结肠镜视频的测试集上评估了该方法准确性,并通过消融实验探究了各方法组件的相对重要性。