Analyzing laparoscopic surgery videos presents a complex and multifaceted challenge, with applications including surgical training, intra-operative surgical complication prediction, and post-operative surgical assessment. Identifying crucial events within these videos is a significant prerequisite in a majority of these applications. In this paper, we introduce a comprehensive dataset tailored for relevant event recognition in laparoscopic gynecology videos. Our dataset includes annotations for critical events associated with major intra-operative challenges and post-operative complications. To validate the precision of our annotations, we assess event recognition performance using several CNN-RNN architectures. Furthermore, we introduce and evaluate a hybrid transformer architecture coupled with a customized training-inference framework to recognize four specific events in laparoscopic surgery videos. Leveraging the Transformer networks, our proposed architecture harnesses inter-frame dependencies to counteract the adverse effects of relevant content occlusion, motion blur, and surgical scene variation, thus significantly enhancing event recognition accuracy. Moreover, we present a frame sampling strategy designed to manage variations in surgical scenes and the surgeons' skill level, resulting in event recognition with high temporal resolution. We empirically demonstrate the superiority of our proposed methodology in event recognition compared to conventional CNN-RNN architectures through a series of extensive experiments.
翻译:分析腹腔镜手术视频是一项复杂且多方面的挑战,其应用包括外科培训、术中手术并发症预测及术后手术评估。在这些大多数应用中,识别视频中的关键事件是重要的前提条件。本文针对腹腔镜妇科手术视频中相关事件识别,引入了一个全面的数据集。该数据集包含与主要术中挑战及术后并发症相关的关键事件标注。为验证标注的准确性,我们采用多种CNN-RNN架构评估了事件识别性能。此外,我们引入并评估了一种混合Transformer架构,并结合定制的训练-推理框架,用于识别腹腔镜手术视频中的四个特定事件。利用Transformer网络,所提出的架构通过捕获帧间依赖关系,有效抵消相关内容遮挡、运动模糊及手术场景变化的不利影响,从而显著提升事件识别准确性。同时,我们提出了一种帧采样策略,旨在管理手术场景变化及外科医生技能水平的差异,实现了高时间分辨率的事件识别。通过一系列广泛实验,我们实证证明了所提方法在事件识别中相比传统CNN-RNN架构的优越性。