Scaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing the underlying policy's weaknesses, leading to inefficient coverage of critical state distributions. Conversely, interactive methods like DAgger effectively address covariate shift but rely on physical robot execution, which is costly and difficult to scale. To reconcile this trade-off, we introduce RoboPocket, a portable system that enables Robot-Free Instant Policy Iteration using single consumer smartphones. Its core innovation is a Remote Inference framework that visualizes the policy's predicted trajectory via Augmented Reality (AR) Visual Foresight. This immersive feedback allows collectors to proactively identify potential failures and focus data collection on the policy's weak regions without requiring a physical robot. Furthermore, we implement an asynchronous Online Finetuning pipeline that continuously updates the policy with incoming data, effectively closing the learning loop in minutes. Extensive experiments demonstrate that RoboPocket adheres to data scaling laws and doubles the data efficiency compared to offline scaling strategies, overcoming their long-standing efficiency bottleneck. Moreover, our instant iteration loop also boosts sample efficiency by up to 2$\times$ in distributed environments a small number of interactive corrections per person. Project page and videos: https://robo-pocket.github.io.
翻译:扩展模仿学习的根本制约在于数据收集的效率。手持式界面虽已成为野外数据采集的可扩展解决方案,但其主要采用开环模式运行:操作者在不知晓底层策略弱点的情况下盲目收集演示数据,导致对关键状态分布的覆盖效率低下。相反,DAgger等交互式方法虽能有效解决协变量偏移问题,却依赖于物理机器人的实际执行,这种方式成本高昂且难以扩展。为协调这一矛盾,我们提出了RoboPocket——一个基于普通智能手机即可实现无机器人即时策略迭代的便携式系统。其核心创新在于远程推理框架,该框架通过增强现实视觉预见技术可视化策略的预测轨迹。这种沉浸式反馈使收集者能够主动识别潜在故障,并将数据收集聚焦于策略的薄弱区域,而无需物理机器人参与。此外,我们实现了异步在线微调管道,能够持续利用输入数据更新策略,在数分钟内有效闭合学习回路。大量实验表明,RoboPocket遵循数据缩放定律,与离线扩展策略相比将数据效率提升了一倍,突破了其长期存在的效率瓶颈。值得注意的是,我们的即时迭代回路在分布式环境中仅需每人少量交互修正,即可将样本效率提升高达2倍。项目主页与视频:https://robo-pocket.github.io。