We propose a CompliantVLA-adaptor that augments the state-of-the-art Vision-Language-Action (VLA) models with vision-language model (VLM)-informed context-aware variable impedance control (VIC) to improve the safety and effectiveness of contact-rich robotic manipulation tasks. Existing VLA systems (e.g., RDT, Pi0.5, OpenVLA-oft) typically output position, but lack force-aware adaptation, leading to unsafe or failed interactions in physical tasks involving contact, compliance, or uncertainty. In the proposed CompliantVLA-adaptor, a VLM interprets task context from images and natural language to adapt the stiffness and damping parameters of a VIC controller. These parameters are further regulated using real-time force/torque feedback to ensure interaction forces remain within safe thresholds. We demonstrate that our method outperforms the VLA baselines on a suite of complex contact-rich tasks, both in simulation and the real world, with improved success rates and reduced force violations. This work presents a promising path towards a safe foundation model for physical contact-rich manipulation. We release our code, prompts, and force-torque-impedance-scenario context datasets at https://sites.google.com/view/compliantvla.
翻译:本文提出一种顺应性VLA适配器,通过整合视觉语言模型(VLM)感知的上下文感知可变阻抗控制(VIC),增强现有最先进的视觉-语言-动作(VLA)模型,以提升接触密集型机器人操作任务的安全性与有效性。现有VLA系统(如RDT、Pi0.5、OpenVLA-oft)通常仅输出位置指令,缺乏力感知适应能力,导致在涉及接触、顺应性或不确定性的物理任务中出现不安全或失败的交互。在所提出的顺应性VLA适配器中,VLM通过图像和自然语言解析任务上下文,动态调整VIC控制器的刚度与阻尼参数。这些参数进一步通过实时力/力矩反馈进行调节,确保交互力始终维持在安全阈值内。我们通过仿真与真实环境中的一系列复杂接触密集型任务验证了所提方法,结果表明其性能优于现有VLA基线系统,在提升任务成功率的同时显著降低了力违规现象。本研究为构建面向物理接触密集型操作的安全基础模型提供了可行路径。相关代码、提示词及力-力矩-阻抗-场景上下文数据集已发布于https://sites.google.com/view/compliantvla。