Deep Multimodal Fusion for Surgical Feedback Classification

Quantification of real-time informal feedback delivered by an experienced surgeon to a trainee during surgery is important for skill improvements in surgical training. Such feedback in the live operating room is inherently multimodal, consisting of verbal conversations (e.g., questions and answers) as well as non-verbal elements (e.g., through visual cues like pointing to anatomic elements). In this work, we leverage a clinically-validated five-category classification of surgical feedback: "Anatomic", "Technical", "Procedural", "Praise" and "Visual Aid". We then develop a multi-label machine learning model to classify these five categories of surgical feedback from inputs of text, audio, and video modalities. The ultimate goal of our work is to help automate the annotation of real-time contextual surgical feedback at scale. Our automated classification of surgical feedback achieves AUCs ranging from 71.5 to 77.6 with the fusion improving performance by 3.1%. We also show that high-quality manual transcriptions of feedback audio from experts improve AUCs to between 76.5 and 96.2, which demonstrates a clear path toward future improvements. Empirically, we find that the Staged training strategy, with first pre-training each modality separately and then training them jointly, is more effective than training different modalities altogether. We also present intuitive findings on the importance of modalities for different feedback categories. This work offers an important first look at the feasibility of automated classification of real-world live surgical feedback based on text, audio, and video modalities.

翻译：术中实时非正式反馈的量化对于提升外科培训中的技能改进至关重要，这类反馈在真实手术环境中天然具有多模态特性，既包含语言交流（如问答），也包含非语言元素（如通过指向解剖部位等视觉线索）。本研究采用经临床验证的五分类手术反馈体系："解剖性"、"技术性"、"程序性"、"表扬"与"视觉辅助"，并据此开发多标签机器学习模型，通过文本、音频和视频三种模态输入对五类手术反馈进行分类。本研究的终极目标是实现大规模实时情境化手术反馈的自动化标注。我们的自动分类方法在AUC指标上达到71.5-77.6，多模态融合使性能提升3.1%。研究表明，采用专家手工转录的高质量反馈音频可将AUC提升至76.5-96.2，这为未来性能优化指明了清晰方向。实验发现，分阶段训练策略（先单独预训练各模态再联合训练）比同时训练所有模态更有效。我们还揭示了不同模态对不同反馈类别重要性的直观规律。本研究首次系统探索了基于文本、音频、视频模态对真实术中反馈进行自动分类的可行性。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日