Minimally invasive surgery can benefit significantly from automated surgical tool detection, enabling advanced analysis and assistance. However, the limited availability of annotated data in surgical settings poses a challenge for training robust deep learning models. This paper introduces a novel staged adaptive fine-tuning approach consisting of two steps: a linear probing stage to condition additional classification layers on a pre-trained CNN-based architecture and a gradual freezing stage to dynamically reduce the fine-tunable layers, aiming to regulate adaptation to the surgical domain. This strategy reduces network complexity and improves efficiency, requiring only a single training loop and eliminating the need for multiple iterations. We validated our method on the Cholec80 dataset, employing CNN architectures (ResNet-50 and DenseNet-121) pre-trained on ImageNet for detecting surgical tools in cholecystectomy endoscopic videos. Our results demonstrate that our method improves detection performance compared to existing approaches and established fine-tuning techniques, achieving a mean average precision (mAP) of 96.4%. To assess its broader applicability, the generalizability of the fine-tuning strategy was further confirmed on the CATARACTS dataset, a distinct domain of minimally invasive ophthalmic surgery. These findings suggest that gradual freezing fine-tuning is a promising technique for improving tool presence detection in diverse surgical procedures and may have broader applications in general image classification tasks.
翻译:微创手术可从自动化手术器械检测中显著获益,从而实现高级分析与辅助。然而,手术场景中标注数据的有限性对训练鲁棒的深度学习模型构成了挑战。本文提出一种新颖的分阶段自适应微调方法,包含两个步骤:线性探测阶段用于在预训练的基于CNN的架构上调节附加分类层,以及渐进冻结阶段通过动态减少可微调层来调控对手术领域的适应性。该策略降低了网络复杂度并提升了效率,仅需单次训练循环且无需多次迭代。我们在Cholec80数据集上验证了所提方法,采用在ImageNet上预训练的CNN架构(ResNet-50和DenseNet-121)进行胆囊切除术内窥镜视频中的手术器械检测。实验结果表明,相较于现有方法和经典微调技术,我们的方法提升了检测性能,达到了96.4%的平均精度均值(mAP)。为评估其更广泛的适用性,该微调策略的泛化能力在CATARACTS数据集(微创眼科手术这一不同领域)上得到了进一步验证。这些发现表明,渐进冻结微调是一种提升多样化手术过程中器械存在性检测性能的有效技术,并可能在通用图像分类任务中具有更广泛的应用前景。