AnyTask：一种用于推进仿真到现实策略学习的自动化任务与数据生成框架 (AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning)

Ran Gong,Xiaohan Zhang,Jinghuan Shang,Maria Vittoria Minniti,Jigarkumar Patel,Valerio Pepe,Riedana Yan,Ahmet Gundogdu,Ivan Kapelyukh,Ali Abbas,Xiaoqiang Yan,Harsh Patel,Laura Herlant,Karl Schmeckpeper

from arxiv, 28 pages, 25 figures. The first four authors contributed equally

Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world. While simulation has become a promising way for scaling up data collection, the related tasks, including simulation task design, task-aware scene generation, expert demonstration synthesis, and sim-to-real transfer, still demand substantial human effort. We present AnyTask, an automated framework that pairs massively parallel GPU simulation with foundation models to design diverse manipulation tasks and synthesize robot data. We introduce three AnyTask agents for generating expert demonstrations aiming to solve as many tasks as possible: 1) ViPR, a novel task and motion planning agent with VLM-in-the-loop Parallel Refinement; 2) ViPR-Eureka, a reinforcement learning agent with generated dense rewards and LLM-guided contact sampling; 3) ViPR-RL, a hybrid planning and learning approach that jointly produces high-quality demonstrations with only sparse rewards. We train behavior cloning policies on generated data, validate them in simulation, and deploy them directly on real robot hardware. The policies generalize to novel object poses, achieving 44% average success across a suite of real-world pick-and-place, drawer opening, contact-rich pushing, and long-horizon manipulation tasks. Our project website is at https://anytask.rai-inst.com .

翻译：通用机器人学习仍然受限于数据：大规模、多样化和高质量的交互数据在现实世界中收集成本高昂。虽然仿真已成为扩展数据收集的一种有前景的途径，但相关任务，包括仿真任务设计、任务感知的场景生成、专家示范合成以及仿真到现实的迁移，仍然需要大量的人工投入。我们提出了AnyTask，这是一个自动化框架，它将大规模并行GPU仿真与基础模型相结合，以设计多样化的操作任务并合成机器人数据。我们引入了三个AnyTask智能体来生成旨在解决尽可能多任务的专家示范：1) ViPR，一种新颖的、采用VLM在环并行精炼的任务与运动规划智能体；2) ViPR-Eureka，一种结合了生成密集奖励和LLM引导接触采样的强化学习智能体；3) ViPR-RL，一种联合规划与学习的混合方法，仅使用稀疏奖励即可共同产生高质量的示范。我们在生成的数据上训练行为克隆策略，在仿真中验证它们，并直接将其部署在真实的机器人硬件上。这些策略能够泛化到新的物体位姿，在一系列现实世界的拾放、抽屉开启、接触丰富的推动以及长时程操作任务中实现了平均44%的成功率。我们的项目网站位于 https://anytask.rai-inst.com。