Surgical phase recognition is a challenging and necessary task for the development of context-aware intelligent systems that can support medical personnel for better patient care and effective operating room management. In this paper, we present a surgical phase recognition framework that employs a Multi-Stage Temporal Convolution Network using speech and X-Ray images for the first time. We evaluate our proposed approach using our dataset that comprises 31 port-catheter placement operations and report 82.56 \% frame-wise accuracy with eight surgical phases. Additionally, we investigate the design choices in the temporal model and solutions for the class-imbalance problem. Our experiments demonstrate that speech and X-Ray data can be effectively utilized for surgical phase recognition, providing a foundation for the development of speech assistants in operating rooms of the future.
翻译:摘要:手术阶段识别是开发上下文感知智能系统的关键且必要的任务,此类系统可支持医务人员优化患者护理并实现高效的手术室管理。本文首次提出了一种采用多阶段时序卷积网络的手术阶段识别框架,该框架融合语音与X射线图像。我们利用包含31例输液港置入手术的自建数据集评估所提方法,在八个手术阶段中实现了82.56%的逐帧准确率。此外,我们探究了时序模型的设计选择以及类别不平衡问题的解决方案。实验证明,语音与X射线数据可有效应用于手术阶段识别,为未来手术室中语音助手的研发奠定了基础。