Quantification of real-time informal feedback delivered by an experienced surgeon to a trainee during surgery is important for skill improvements in surgical training. Such feedback in the live operating room is inherently multimodal, consisting of verbal conversations (e.g., questions and answers) as well as non-verbal elements (e.g., through visual cues like pointing to anatomic elements). In this work, we leverage a clinically-validated five-category classification of surgical feedback: "Anatomic", "Technical", "Procedural", "Praise" and "Visual Aid". We then develop a multi-label machine learning model to classify these five categories of surgical feedback from inputs of text, audio, and video modalities. The ultimate goal of our work is to help automate the annotation of real-time contextual surgical feedback at scale. Our automated classification of surgical feedback achieves AUCs ranging from 71.5 to 77.6 with the fusion improving performance by 3.1%. We also show that high-quality manual transcriptions of feedback audio from experts improve AUCs to between 76.5 and 96.2, which demonstrates a clear path toward future improvements. Empirically, we find that the Staged training strategy, with first pre-training each modality separately and then training them jointly, is more effective than training different modalities altogether. We also present intuitive findings on the importance of modalities for different feedback categories. This work offers an important first look at the feasibility of automated classification of real-world live surgical feedback based on text, audio, and video modalities.
翻译:术中实时非正式反馈的量化对于提升外科培训中的技能改进至关重要,这类反馈在真实手术环境中天然具有多模态特性,既包含语言交流(如问答),也包含非语言元素(如通过指向解剖部位等视觉线索)。本研究采用经临床验证的五分类手术反馈体系:"解剖性"、"技术性"、"程序性"、"表扬"与"视觉辅助",并据此开发多标签机器学习模型,通过文本、音频和视频三种模态输入对五类手术反馈进行分类。本研究的终极目标是实现大规模实时情境化手术反馈的自动化标注。我们的自动分类方法在AUC指标上达到71.5-77.6,多模态融合使性能提升3.1%。研究表明,采用专家手工转录的高质量反馈音频可将AUC提升至76.5-96.2,这为未来性能优化指明了清晰方向。实验发现,分阶段训练策略(先单独预训练各模态再联合训练)比同时训练所有模态更有效。我们还揭示了不同模态对不同反馈类别重要性的直观规律。本研究首次系统探索了基于文本、音频、视频模态对真实术中反馈进行自动分类的可行性。