RoboClaw: An Agentic Framework for Scalable Long-Horizon Robotic Tasks

Ruiying Li,Yunlang Zhou,YuYao Zhu,Kylin Chen,Jingyuan Wang,Sukai Wang,Kongtao Hu,Minhui Yu,Bowen Jiang,Zhan Su,Jiayao Ma,Xin He,Yongjian Shen,Yang Yang,Guanghui Ren,Maoqing Yao,Wenhao Wang,Yao Mu

from arxiv, Code available at: https://github.com/RoboClaw-Robotics/RoboClaw

Vision-Language-Action (VLA) systems have shown strong potential for language-driven robotic manipulation. However, scaling them to long-horizon tasks remains challenging. Existing pipelines typically separate data collection, policy learning, and deployment, resulting in heavy reliance on manual environment resets and brittle multi-policy execution. We present RoboClaw, an agentic robotics framework that unifies data collection, policy learning, and task execution under a single VLM-driven controller. At the policy level, RoboClaw introduces Entangled Action Pairs (EAP), which couple forward manipulation behaviors with inverse recovery actions to form self-resetting loops for autonomous data collection. This mechanism enables continuous on-policy data acquisition and iterative policy refinement with minimal human intervention. During deployment, the same agent performs high-level reasoning and dynamically orchestrates learned policy primitives to accomplish long-horizon tasks. By maintaining consistent contextual semantics across collection and execution, RoboClaw reduces mismatch between the two phases and improves multi-policy robustness. Experiments in real-world manipulation tasks demonstrate improved stability and scalability compared to conventional open-loop pipelines, while significantly reducing human effort throughout the robot lifecycle, achieving a 25% improvement in success rate over baseline methods on long-horizon tasks and reducing human time investment by 53.7%.

翻译：视觉-语言-动作（VLA）系统在语言驱动机器人操作中展现出巨大潜力，但将其扩展到长时域任务仍具挑战性。现有流程通常将数据采集、策略学习与部署分离，导致严重依赖人工环境重置与脆弱的混合策略执行。我们提出RoboClaw——一个统一的机器人智能体框架，将数据采集、策略学习与任务执行整合在单一VLM驱动的控制器中。在策略层面，RoboClaw引入纠缠动作对（EAP），通过耦合前向操作行为与逆向恢复动作形成自复位循环，实现自主数据采集。该机制支持连续在线策略数据获取与迭代策略优化，仅需极少量人工干预。部署时，同一智能体执行高层推理并动态编排已习得的策略基元以完成长时域任务。通过保持采集与执行阶段上下文语义的一致性，RoboClaw消除了两阶段间的语义鸿沟，提升了混合策略的鲁棒性。在真实世界操作任务中的实验表明，相较于传统开环流水线，该方法在提升稳定性和可扩展性的同时，显著降低了机器人全生命周期的人工投入——在长时域任务上相较基线方法成功率达25%的提升，并减少了53.7%的人工时间投入。