High-quality intraoperative feedback from a surgical trainer is pivotal for improving trainee performance and long-term skill acquisition. Automating natural, trainer-style feedback promises timely, accessible, and consistent guidance at scale but requires models that understand clinically relevant representations. We present a structure-aware pipeline that learns a surgical action ontology from real trainer-to-trainee transcripts (33 surgeries) and uses it to condition feedback generation. We contribute by (1) mining Instrument-Action-Target (IAT) triplets from real-world feedback text and clustering surface forms into normalized categories, (2) fine-tuning a video-to-IAT model that leverages the surgical procedure and task contexts as well as fine-grained temporal instrument motion, and (3) demonstrating how to effectively use IAT triplet representations to guide GPT-4o in generating clinically grounded, trainer-style feedback. We show that, on Task 1: Video-to-IAT recognition, our context injection and temporal tracking deliver consistent AUC gains (Instrument: 0.67 to 0.74; Action: 0.60 to 0.63; Tissue: 0.74 to 0.79). For Task 2: feedback text generation (rated on a 1-5 fidelity rubric where 1 = opposite/unsafe, 3 = admissible, and 5 = perfect match to a human trainer), GPT-4o from video alone scores 2.17, while IAT conditioning reaches 2.44 (+12.4%), doubling the share of admissible generations with score >= 3 from 21% to 42%. Traditional text-similarity metrics also improve: word error rate decreases by 15-31% and ROUGE (phrase/substring overlap) increases by 9-64%. Grounding generation in explicit IAT structure improves fidelity and yields clinician-verifiable rationales, supporting auditable use in surgical training.


翻译:高质量的手术中反馈对于提升学员表现和长期技能习得至关重要。自动化生成自然、符合导师风格的反馈有望提供及时、可及且一致的大规模指导,但这需要模型能够理解临床相关的表征。我们提出了一种结构感知的流水线方法,从真实导师-学员对话记录(33台手术)中学习手术动作本体,并利用其条件化反馈生成。我们的贡献在于:(1)从真实世界反馈文本中挖掘器械-动作-目标三元组,并将表层形式聚类为规范化类别;(2)微调视频到三元组模型,该模型利用手术流程与任务上下文以及细粒度时序器械运动信息;(3)展示如何有效使用三元组表征引导GPT-4o生成临床扎根、符合导师风格的反馈。实验表明,在任务1(视频到三元组识别)中,我们的上下文注入与时序跟踪方法带来一致的AUC提升(器械:0.67至0.74;动作:0.60至0.63;组织:0.74至0.79)。对于任务2(反馈文本生成,采用1-5分保真度评分标准,其中1=相反/不安全,3=可接受,5=与人类导师完美匹配),仅基于视频的GPT-4o得分为2.17,而三元组条件化方法达到2.44(提升12.4%),将得分≥3的可接受生成比例从21%翻倍至42%。传统文本相似性指标也得到改善:词错误率降低15-31%,ROUGE(短语/子串重叠度)提升9-64%。将生成过程扎根于显式三元组结构可提高保真度并产生临床可验证的推理依据,支持在手术培训中的可审计应用。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员