One of the central challenges preventing robots from acquiring complex manipulation skills is the prohibitive cost of collecting large-scale robot demonstrations. In contrast, humans are able to learn efficiently by watching others interact with their environment. To bridge this gap, we introduce semantic action flow as a core intermediate representation capturing the essential spatio-temporal manipulator-object interactions, invariant to superficial visual differences. We present ViSA-Flow, a framework that learns this representation self-supervised from unlabeled large-scale video data. First, a generative model is pre-trained on semantic action flows automatically extracted from large-scale human-object interaction video data, learning a robust prior over manipulation structure. Second, this prior is efficiently adapted to a target robot by fine-tuning on a small set of robot demonstrations processed through the same semantic abstraction pipeline. We demonstrate through extensive experiments on the CALVIN benchmark and real-world tasks that ViSA-Flow achieves state-of-the-art performance, particularly in low-data regimes, outperforming prior methods by effectively transferring knowledge from human video observation to robotic execution. Videos are available at https://visaflow-web.github.io/ViSAFLOW.


翻译:阻碍机器人掌握复杂操作技能的核心挑战之一在于收集大规模机器人演示数据的高昂成本。相比之下,人类能够通过观察他人与环境互动来高效学习。为弥合这一差距,我们引入语义动作流作为核心中间表示,它捕捉了关键的时空操作者-物体交互关系,且对表面视觉差异具有不变性。我们提出ViSA-Flow框架,该框架通过自监督方式从无标注的大规模视频数据中学习这一表示。首先,基于从大规模人-物交互视频数据中自动提取的语义动作流预训练生成模型,从而学习到对操作结构的鲁棒先验知识。其次,通过使用相同语义抽象流程处理的少量机器人演示数据对该先验进行微调,可高效适配至目标机器人。我们在CALVIN基准测试和真实世界任务中通过大量实验证明,ViSA-Flow尤其在低数据量场景下取得了最先进的性能,通过有效将人类视频观察知识迁移至机器人执行,显著超越了现有方法。演示视频详见 https://visaflow-web.github.io/ViSAFLOW。

0
下载
关闭预览

相关内容

VIP会员
最新内容
《无人机革命:来自俄乌战场的启示》(报告)
专知会员服务
3+阅读 · 今天6:48
《实现联合作战能力所需的技术》58页报告
专知会员服务
1+阅读 · 今天6:30
以色列运用人工智能优化空袭警报系统
专知会员服务
0+阅读 · 今天6:20
以色列在多条战线部署AI智能体
专知会员服务
1+阅读 · 今天6:12
2025年大语言模型进展报告
专知会员服务
14+阅读 · 4月25日
多智能体协作机制
专知会员服务
13+阅读 · 4月25日
非对称优势:美海军开发低成本反无人机技术
专知会员服务
9+阅读 · 4月25日
《美战争部小企业创新研究(SBIR)计划》
专知会员服务
8+阅读 · 4月25日
Top
微信扫码咨询专知VIP会员