In image-assisted minimally invasive surgeries (MIS), understanding surgical scenes is vital for real-time feedback to surgeons, skill evaluation, and improving outcomes through collaborative human-robot procedures. Within this context, the challenge lies in accurately detecting, segmenting, and estimating the depth of surgical scenes depicted in high-resolution images, while simultaneously reconstructing the scene in 3D and providing segmentation of surgical instruments along with detection labels for each instrument. To address this challenge, a novel Multi-Task Learning (MTL) network is proposed for performing these tasks concurrently. A key aspect of this approach involves overcoming the optimization hurdles associated with handling multiple tasks concurrently by integrating a Adversarial Weight Update into the MTL framework, the proposed MTL model achieves 3D reconstruction through the integration of segmentation, depth estimation, and object detection, thereby enhancing the understanding of surgical scenes, which marks a significant advancement compared to existing studies that lack 3D capabilities. Comprehensive experiments on the EndoVis2018 benchmark dataset underscore the adeptness of the model in efficiently addressing all three tasks, demonstrating the efficacy of the proposed techniques.
翻译:在图像辅助的微创手术中,理解手术场景对于向外科医生提供实时反馈、进行技能评估以及通过人机协作手术改善预后至关重要。在此背景下,面临的挑战在于从高分辨率图像中准确检测、分割并估计手术场景的深度,同时重建三维场景,并提供手术器械的分割结果及各器械的检测标签。为应对这一挑战,本文提出一种新颖的多任务学习网络,用于并行执行上述任务。该方法的一个关键方面在于,通过将对抗性权重更新机制整合到多任务学习框架中,克服了同时处理多项任务时面临的优化难题。所提出的多任务学习模型通过融合分割、深度估计与目标检测实现三维重建,从而增强对手术场景的理解,这相较于现有缺乏三维能力的研究标志着显著进步。在EndoVis2018基准数据集上的综合实验表明,该模型能有效处理全部三项任务,验证了所提技术的优越性。