Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. Prior works have approached this task by including semantic segmentation as an intermediate step, using predicted segmentation masks to then predict the CVS. While these methods are effective, they rely on extremely expensive ground-truth segmentation annotations and tend to fail when the predicted segmentation is incorrect, limiting generalization. In this work, we propose a method for CVS prediction wherein we first represent a surgical image using a disentangled latent scene graph, then process this representation using a graph neural network. Our graph representations explicitly encode semantic information - object location, class information, geometric relations - to improve anatomy-driven reasoning, as well as visual features to retain differentiability and thereby provide robustness to semantic errors. Finally, to address annotation cost, we propose to train our method using only bounding box annotations, incorporating an auxiliary image reconstruction objective to learn fine-grained object boundaries. We show that our method not only outperforms several baseline methods when trained with bounding box annotations, but also scales effectively when trained with segmentation masks, maintaining state-of-the-art performance.
翻译:腹腔镜胆囊切除术中安全关键视图的评估需要准确识别和定位关键解剖结构,推理它们之间的几何关系,并判断其暴露质量。先前的研究通过将语义分割作为中间步骤来完成此任务,利用预测的分割掩膜来预测CVS。尽管这些方法有效,但依赖极其昂贵的真实分割标注,且在预测分割不正确时容易失败,从而限制了泛化能力。在本工作中,我们提出了一种CVS预测方法:首先使用解耦的潜在场景图表示手术图像,然后通过图神经网络处理该表示。我们的图表示显式编码语义信息——对象位置、类别信息、几何关系——以改进基于解剖结构的推理,同时保留视觉特征以保持可微性,从而对语义误差具有鲁棒性。最后,为降低标注成本,我们提出仅使用边界框标注训练我们的方法,并引入辅助图像重建目标来学习精细的对象边界。结果表明,我们的方法不仅在仅使用边界框标注训练时优于多种基线方法,而且在使用分割掩膜训练时也能有效扩展,保持最先进的性能。