Recording surgery in operating rooms is an essential task for education and evaluation of medical treatment. However, recording the desired targets, such as the surgery field, surgical tools, or doctor's hands, is difficult because the targets are heavily occluded during surgery. We use a recording system in which multiple cameras are embedded in the surgical lamp, and we assume that at least one camera is recording the target without occlusion at any given time. As the embedded cameras obtain multiple video sequences, we address the task of selecting the camera with the best view of the surgery. Unlike the conventional method, which selects the camera based on the area size of the surgery field, we propose a deep neural network that predicts the camera selection probability from multiple video sequences by learning the supervision of the expert annotation. We created a dataset in which six different types of plastic surgery are recorded, and we provided the annotation of camera switching. Our experiments show that our approach successfully switched between cameras and outperformed three baseline methods.
翻译:在手术室中录制手术过程是医学教育与评估的重要任务。然而,由于手术目标(如手术视野、手术器械或医生双手)在术中常被严重遮挡,准确录制定向目标较为困难。我们采用一种手术灯内嵌多摄像机的录制系统,并假设在任意时刻至少有一台摄像机可无遮挡地记录目标。针对内嵌摄像机获取的多路视频序列,我们需解决如何选取最佳手术视角摄像机的问题。不同于传统方法基于手术视野面积大小进行摄像机选择,我们提出一种深度神经网络,通过专家标注的监督学习,从多路视频序列中预测摄像机选择概率。我们构建了一个包含六种不同类型整形手术记录的数据集,并提供了摄像机切换的标注。实验表明,我们的方法能成功实现摄像机切换,且性能优于三种基线方法。