High-reliability long-horizon robotic manipulation has traditionally relied on large-scale data and compute to understand complex real-world dynamics. However, we identify that the primary bottleneck to real-world robustness is not resource scale alone, but the distributional shift among the human demonstration distribution, the inductive bias learned by the policy, and the test-time execution distribution -- a systematic inconsistency that causes compounding errors in multi-stage tasks. To mitigate these inconsistencies, we propose $χ_{0}$, a resource-efficient framework with effective modules designated to achieve production-level robustness in robotic manipulation. Our approach builds off three technical pillars: (i) Model Arithmetic, a weight-space merging strategy that efficiently soaks up diverse distributions of different demonstrations, varying from object appearance to state variations; (ii) Stage Advantage, a stage-aware advantage estimator that provides stable, dense progress signals, overcoming the numerical instability of prior non-stage approaches; and (iii) Train-Deploy Alignment, which bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. $χ_{0}$ enables two sets of dual-arm robots to collaboratively orchestrate long-horizon garment manipulation, spanning tasks from flattening, folding, to hanging different clothes. Our method exhibits high-reliability autonomy; we are able to run the system from arbitrary initial state for consecutive 24 hours non-stop. Experiments validate that $χ_{0}$ surpasses the state-of-the-art $π_{0.5}$ in success rate by nearly 250%, with only 20-hour data and 8 A100 GPUs. Code, data and models will be released to facilitate the community.
翻译:高可靠性长程机器人操作传统上依赖于大规模数据和计算来理解复杂的现实世界动力学。然而,我们发现现实世界鲁棒性的主要瓶颈并非仅仅是资源规模,而是人类示教分布、策略学习到的归纳偏置以及测试时执行分布之间的分布偏移——这种系统性不一致会导致多阶段任务中的复合误差。为了缓解这些不一致性,我们提出了$χ_{0}$,一个资源高效的框架,其配备了旨在实现机器人操作生产级鲁棒性的有效模块。我们的方法建立在三大技术支柱之上:(i) 模型算术,一种权重空间合并策略,能高效吸收从物体外观到状态变化等不同示教的多样分布;(ii) 阶段优势,一种阶段感知的优势估计器,提供稳定、密集的进度信号,克服了先前非阶段方法的数值不稳定性;(iii) 训练-部署对齐,通过时空增强、启发式DAgger校正和时间分块平滑来弥合分布差距。$χ_{0}$使得两组双臂机器人能够协作编排长程衣物操作,涵盖从铺平、折叠到悬挂不同衣物的任务。我们的方法展现出高可靠性的自主性;我们能够从任意初始状态连续不间断地运行系统24小时。实验验证,$χ_{0}$在成功率上以仅20小时数据和8块A100 GPU,超越了最先进的$π_{0.5}$近250%。代码、数据和模型将被发布以促进社区发展。