Surgical context inference has recently garnered significant attention in robot-assisted surgery as it can facilitate workflow analysis, skill assessment, and error detection. However, runtime context inference is challenging since it requires timely and accurate detection of the interactions among the tools and objects in the surgical scene based on the segmentation of video data. On the other hand, existing state-of-the-art video segmentation methods are often biased against infrequent classes and fail to provide temporal consistency for segmented masks. This can negatively impact the context inference and accurate detection of critical states. In this study, we propose a solution to these challenges using a Space Time Correspondence Network (STCN). STCN is a memory network that performs binary segmentation and minimizes the effects of class imbalance. The use of a memory bank in STCN allows for the utilization of past image and segmentation information, thereby ensuring consistency of the masks. Our experiments using the publicly available JIGSAWS dataset demonstrate that STCN achieves superior segmentation performance for objects that are difficult to segment, such as needle and thread, and improves context inference compared to the state-of-the-art. We also demonstrate that segmentation and context inference can be performed at runtime without compromising performance.
翻译:手术情境推理近期在机器人辅助手术领域受到广泛关注,因其能促进工作流程分析、技能评估与错误检测。然而,运行时情境推理面临挑战,这需要基于视频数据分割结果及时准确地检测手术场景中工具与物体间的交互。另一方面,现有最先进的视频分割方法往往对低频类别存在偏倚,且无法为分割掩码提供时间一致性。这会对关键状态的情境推理与准确检测产生负面影响。本研究提出一种基于时空对应网络(STCN)的解决方案。STCN是一种执行二值分割的记忆网络,能最小化类别不平衡的影响。STCN中的记忆库机制可充分利用历史图像与分割信息,从而确保掩码的时间一致性。基于公开数据集JIGSAWS的实验表明,STCN对难以分割的物体(如针线)具有更优的分割性能,相比现有最先进方法能有效改善情境推理。同时我们验证了在不牺牲性能的前提下,该方法可实现运行时分割与情境推理。