Activity Detection in Long Surgical Videos using Spatio-Temporal Models

Automatic activity detection is an important component for developing technologies that enable next generation surgical devices and workflow monitoring systems. In many application, the videos of interest are long and include several activities; hence, the deep models designed for such purposes consist of a backbone and a temporal sequence modeling architecture. In this paper, we investigate both the state-of-the-art activity recognition and temporal models to find the architectures that yield the highest performance. We first benchmark these models on a large-scale activity recognition dataset in the operating room with over 800 full-length surgical videos. However, since most other medical applications lack such a large dataset, we further evaluate our models on the Cholec80 surgical phase segmentation dataset, consisting of only 40 training videos. For backbone architectures, we investigate both 3D ConvNets and most recent transformer-based models; for temporal modeling, we include temporal ConvNets, RNNs, and transformer models for a comprehensive and thorough study. We show that even in the case of limited labeled data, we can outperform the existing work by benefiting from models pre-trained on other tasks.

翻译：自动活动探测是开发技术以促成下一代外科手术器械和工作流程监测系统的重要组成部分。在许多应用中,有兴趣的视频是长长的,包括若干活动;因此,为此目的设计的深层模型包括一个主干和时间序列建模结构。在本文件中,我们调查最先进的活动识别模型和时间模型,以找到产生最高性能的架构。我们首先将这些模型以操作室大型活动识别数据集为基准,使用800多部全长外科录像。然而,由于大多数其他医疗应用程序缺乏如此庞大的数据集,我们进一步评估了Cholec80外科切分解数据集的模型,只有40个培训视频。对于主干结构,我们调查了3D ConvNets和最新的变压器模型;对于时间模型,我们包括时间型模型、ConvNets、RNNS和变压器模型,以便进行全面和彻底的研究。我们显示,即使有有限的标签数据,我们也能从模型中得过现有工作。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

专知会员服务

68+阅读 · 2022年3月29日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日