General-purpose foundation models for increased autonomy in robot-assisted surgery

The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise toward being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: (1) there is a lack of existing large-scale open-source data to train models, (2) it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue, and (3) surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This perspective article aims to provide a path toward increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision-language-action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide three guiding actions toward increased autonomy in robot-assisted surgery.

翻译：端到端机器人学习的主流范式侧重于优化特定任务目标，以解决单一机器人问题，例如抓取物体或到达目标位置。然而，近期对机器人领域高容量模型的研究表明，此类模型有望在多样化、任务无关的大规模视频演示数据集上进行训练。这些模型展现出对未见情境的显著泛化能力，尤其是在数据量和模型复杂度提升时尤为突出。基于数据学习的手术机器人系统在进展速度上落后于机器人学习的其他领域，主要原因有三：（1）缺乏现有的大规模开源数据来训练模型；（2）手术中机器人操作的软体变形难以建模，因为模拟无法匹配生物组织的物理与视觉复杂性；（3）手术机器人在临床试验中可能对患者造成伤害，需要更严格的安全措施。本前瞻性文章旨在通过开发面向手术机器人的多模态、多任务、视觉-语言-动作模型，为提升机器人辅助手术的自主性提供路径。最终，我们认为手术机器人在受益于通用基础模型方面具有独特优势，并提出三项指导性行动以增强机器人辅助手术的自主性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

《用于无线通信和传感的智能反射面 (IRS)》（ICC 2022）新加坡国立大学2022最新53页slides

专知会员服务

26+阅读 · 2022年11月16日

Nat. Biotechnol. | 机器学习为生物库驱动的药物发现提供动力

专知会员服务

11+阅读 · 2022年9月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日