视觉-语言-动作模型与扩散策略切换实现拟人手灵巧控制 (Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand)

To advance autonomous dexterous manipulation, we propose a hybrid control method that combines the relative advantages of a fine-tuned Vision-Language-Action (VLA) model and diffusion models. The VLA model provides language commanded high-level planning, which is highly generalizable, while the diffusion model handles low-level interactions which offers the precision and robustness required for specific objects and environments. By incorporating a switching signal into the training-data, we enable event based transitions between these two models for a pick-and-place task where the target object and placement location is commanded through language. This approach is deployed on our anthropomorphic ADAPT Hand 2, a 13DoF robotic hand, which incorporates compliance through series elastic actuation allowing for resilience for any interactions: showing the first use of a multi-fingered hand controlled with a VLA model. We demonstrate this model switching approach results in a over 80\% success rate compared to under 40\% when only using a VLA model, enabled by accurate near-object arm motion by the VLA model and a multi-modal grasping motion with error recovery abilities from the diffusion model.

翻译：为推进自主灵巧操作研究，本文提出一种混合控制方法，融合了微调视觉-语言-动作模型与扩散模型的相对优势。VLA模型提供语言指令驱动的高层规划，具备高度泛化能力；而扩散模型处理低层交互，为特定物体与环境提供所需的精确性与鲁棒性。通过在训练数据中嵌入切换信号，我们实现了基于事件的两模型间切换机制，适用于目标物体与放置位置均由语言指令指定的抓放任务。该方法部署于我们研发的拟人化ADAPT Hand 2——一款13自由度的机器人手，其通过串联弹性驱动实现柔顺控制，能适应各类交互场景：本研究首次展示了多指手在VLA模型控制下的实际应用。实验表明，该模型切换方法的任务成功率超过80%，而单一VLA模型成功率不足40%。其优势源于VLA模型提供的精确近物臂部运动，以及扩散模型支持的多模态抓取动作与误差恢复能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/