Rendezvous in Time: An Attention-based Temporal Fusion approach for Surgical Triplet Recognition

One of the recent advances in surgical AI is the recognition of surgical activities as triplets of (instrument, verb, target). Albeit providing detailed information for computer-assisted intervention, current triplet recognition approaches rely only on single frame features. Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos. In this paper, we propose Rendezvous in Time (RiT) - a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling. Focusing more on the verbs, our RiT explores the connectedness of current and past frames to learn temporal attention-based features for enhanced triplet recognition. We validate our proposal on the challenging surgical triplet dataset, CholecT45, demonstrating an improved recognition of the verb and triplet along with other interactions involving the verb such as (instrument, verb). Qualitative results show that the RiT produces smoother predictions for most triplet instances than the state-of-the-arts. We present a novel attention-based approach that leverages the temporal fusion of video frames to model the evolution of surgical actions and exploit their benefits for surgical triplet recognition.

翻译：近年来，手术人工智能领域的一项重要进展是将手术活动识别为（器械、动词、目标）的三元组。尽管为计算机辅助干预提供了详细信息，当前的三元组识别方法仅依赖于单帧特征。利用早期帧中的时间线索将改善视频中手术动作三元组的识别。在本文中，我们提出“随时间相会”（RiT）——一种通过时间建模扩展当前最优模型Rendezvous的深度学习模型。我们的RiT更关注动词，探索当前帧与过去帧之间的关联性，以学习基于注意力的时间特征，从而增强三元组识别。我们在具有挑战性的手术三元组数据集CholecT45上验证了我们的提议，结果表明动词和三元组的识别性能得到提升，同时涉及动词的其他交互关系（如（器械、动词））也得到改善。定性结果显示，与现有最优方法相比，RiT对大多数三元组实例能产生更平滑的预测。我们提出了一种新颖的基于注意力的方法，通过视频帧的时间融合来建模手术动作的演化，并利用其优势进行手术三元组识别。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日