GLSFormer : Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos

Automated surgical step recognition is an important task that can significantly improve patient safety and decision-making during surgeries. Existing state-of-the-art methods for surgical step recognition either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly. However, the benefits of joint modeling of spatio-temporal features and long-range information are not taken in account. In this paper, we propose a vision transformer-based approach to jointly learn spatio-temporal features directly from sequence of frame-level patches. Our method incorporates a gated-temporal attention mechanism that intelligently combines short-term and long-term spatio-temporal feature representations. We extensively evaluate our approach on two cataract surgery video datasets, namely Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods. These results validate the suitability of our proposed approach for automated surgical step recognition. Our code is released at: https://github.com/nisargshah1999/GLSFormer

翻译：自动手术步骤识别是一项重要任务，可显著提升手术过程中的患者安全性和决策支持。现有最佳方法要么依赖分离的、多阶段的空间与时间信息建模，要么在联合学习时仅处理短时域分辨率。然而，这些方法未能考虑时空特征与长程信息联合建模的优势。本文提出一种基于视觉Transformer的方法，可直接从帧级图像块序列中联合学习时空特征。该方法引入门控时序注意力机制，智能融合短时与长时空域特征表示。我们在两个白内障手术视频数据集（Cataract-101和D99）上进行了广泛评估，结果表明该方法相比多种现有先进方法具有更优性能。这些结果验证了所提方法在自动手术步骤识别任务中的适用性。代码已在以下地址开源：https://github.com/nisargshah1999/GLSFormer

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日