This study aims to advance surgical phase recognition in arthroscopic procedures, specifically Anterior Cruciate Ligament (ACL) reconstruction, by introducing the first arthroscopy dataset and developing a novel transformer-based model. We aim to establish a benchmark for arthroscopic surgical phase recognition by leveraging spatio-temporal features to address the specific challenges of arthroscopic videos including limited field of view, occlusions, and visual distortions. We developed the ACL27 dataset, comprising 27 videos of ACL surgeries, each labeled with surgical phases. Our model employs a transformer-based architecture, utilizing temporal-aware frame-wise feature extraction through a ResNet-50 and transformer layers. This approach integrates spatio-temporal features and introduces a Surgical Progress Index (SPI) to quantify surgery progression. The model's performance was evaluated using accuracy, precision, recall, and Jaccard Index on the ACL27 and Cholec80 datasets. The proposed model achieved an overall accuracy of 72.91% on the ACL27 dataset. On the Cholec80 dataset, the model achieved a comparable performance with the state-of-the-art methods with an accuracy of 92.4%. The SPI demonstrated an output error of 10.6% and 9.86% on ACL27 and Cholec80 datasets respectively, indicating reliable surgery progression estimation. This study introduces a significant advancement in surgical phase recognition for arthroscopy, providing a comprehensive dataset and a robust transformer-based model. The results validate the model's effectiveness and generalizability, highlighting its potential to improve surgical training, real-time assistance, and operational efficiency in orthopedic surgery. The publicly available dataset and code will facilitate future research and development in this critical field.
翻译:本研究旨在通过引入首个关节镜数据集并开发一种新颖的基于Transformer的模型,以推进关节镜手术(特别是前交叉韧带重建手术)中的手术阶段识别。我们计划利用时空特征来应对关节镜视频特有的挑战(包括视野受限、遮挡和视觉畸变),从而为关节镜手术阶段识别建立基准。我们开发了ACL27数据集,包含27段ACL手术视频,每段视频均标注了手术阶段。我们的模型采用基于Transformer的架构,通过ResNet-50和Transformer层实现时序感知的逐帧特征提取。该方法整合了时空特征,并引入了手术进展指数以量化手术进程。模型性能在ACL27和Cholec80数据集上使用准确率、精确率、召回率和Jaccard指数进行评估。所提模型在ACL27数据集上实现了72.91%的整体准确率。在Cholec80数据集上,该模型取得了与先进方法相当的性能,准确率达到92.4%。SPI在ACL27和Cholec80数据集上的输出误差分别为10.6%和9.86%,表明其能可靠估计手术进展。本研究为关节镜手术阶段识别带来了重要进展,提供了全面的数据集和鲁棒的基于Transformer的模型。结果验证了模型的有效性和泛化能力,突显了其在改善骨科手术培训、实时辅助和操作效率方面的潜力。公开可用的数据集和代码将促进这一关键领域的未来研究与发展。