Vision-Language-Action (VLA) models have become a powerful framework for robotic manipulation, and recent studies have introduced tactile or force feedback into VLAs to address contact-rich tasks. However, these models are typically deployed as offline policies. When contact conditions shift from the training distribution, the policy cannot perform online adaptation, leading to problems such as inappropriate contact forces and inefficient retries. Therefore, we propose TORL-VLA, a tactile-guided online reinforcement learning framework that couples tactile feedback with policy refinement for contact-rich manipulation. Our method introduces a tactile-derived wrench-aware VLA to predict reference actions and future wrench sequences, while a lightweight online RL module is used to refine the reference actions. To stabilize learning from mixed exploratory policy-generated and human-intervention data, we introduce an intervention-censored critic that prevents post-intervention success from being wrongly credited to policy-generated actions preceding intervention. Real-robot experiments on long-horizon contact-rich tasks, including latch manipulation, coffee-cup placement, and egg handling, show that TORL-VLA improves success rates at both subtask and full-task levels, as well as time-bounded execution efficiency over strong baselines. Project page: https://torl-vla.github.io/
翻译:视觉-语言-动作(VLA)模型已成为机器人操作领域的重要框架,近年研究通过引入触觉或力觉反馈扩展VLA以应对接触密集型任务。然而,这些模型通常作为离线策略部署。当接触条件偏离训练分布时,策略无法进行在线自适应,导致接触力不当和低效重试等问题。为此,我们提出TORL-VLA——一种触觉引导的在线强化学习框架,通过触觉反馈与策略优化耦合实现精密接触操作。该方法构建触觉感知的力矩感知VLA模型,用于预测参考动作和未来力矩序列,并采用轻量级在线强化学习模块对参考动作进行优化。为稳定混合探索策略生成数据与人工干预数据的联合学习过程,我们引入干预截断评判器,防止干预后的成功信号被错误归因于干预前的策略生成动作。在包含门闩操作、咖啡杯放置和鸡蛋处理的长期接触密集型任务中,真实机器人实验表明,TORL-VLA在子任务级与完整任务级成功率以及时间受限执行效率上均优于强基线方法。项目页面:https://torl-vla.github.io/