We propose a CompliantVLA-adaptor that augments the state-of-the-art Vision-Language-Action (VLA) models with vision-language model (VLM)-informed context-aware variable impedance control (VIC) to improve the safety and effectiveness of contact-rich robotic manipulation tasks. Existing VLA systems (e.g., RDT, Pi0, OpenVLA-oft) typically output position, but lack force-aware adaptation, leading to unsafe or failed interactions in physical tasks involving contact, compliance, or uncertainty. In the proposed CompliantVLA-adaptor, a VLM interprets task context from images and natural language to adapt the stiffness and damping parameters of a VIC controller. These parameters are further regulated using real-time force/torque feedback to ensure interaction forces remain within safe thresholds. We demonstrate that our method outperforms the VLA baselines on a suite of complex contact-rich tasks, both in simulation and on real hardware, with improved success rates and reduced force violations. The overall success rate across all tasks increases from 9.86\% to 17.29\%, presenting a promising path towards safe contact-rich manipulation using VLAs. We release our code, prompts, and force-torque-impedance-scenario context datasets at https://sites.google.com/view/compliantvla.
翻译:本文提出一种顺应性VLA适配器,通过集成视觉语言模型(VLM)感知的上下文感知可变阻抗控制(VIC),增强现有最先进的视觉-语言-动作(VLA)模型,以提升接触密集型机器人操作任务的安全性与有效性。现有VLA系统(如RDT、Pi0、OpenVLA-oft)通常输出位置指令,但缺乏力感知适应能力,导致涉及接触、顺应性或不确定性的物理任务中出现不安全或失败的交互。在所提出的顺应性VLA适配器中,VLM通过图像和自然语言解析任务上下文,自适应调整VIC控制器的刚度与阻尼参数。这些参数进一步通过实时力/力矩反馈进行调节,确保交互力始终维持在安全阈值内。我们在仿真和真实硬件平台上的一系列复杂接触密集型任务中验证了该方法,其性能优于现有VLA基线模型,在提升任务成功率的同时显著降低了力违规现象。所有任务的总体成功率从9.86%提升至17.29%,为利用VLA实现安全接触密集型操作提供了可行路径。相关代码、提示词及力-力矩-阻抗-场景上下文数据集已发布于https://sites.google.com/view/compliantvla。