Controllable cooperative humanoid manipulation is a fundamental yet challenging problem for embodied intelligence, due to severe data scarcity, complexities in multi-agent coordination, and limited generalization across objects. In this paper, we present SynAgent, a unified framework that enables scalable and physically plausible cooperative manipulation by leveraging Solo-to-Cooperative Agent Synergy to transfer skills from single-agent human-object interaction to multi-agent human-object-human scenarios. To maintain semantic integrity during motion transfer, we introduce an interaction-preserving retargeting method based on an Interact Mesh constructed via Delaunay tetrahedralization, which faithfully maintains spatial relationships among humans and objects. Building upon this refined data, we propose a single-agent pretraining and adaptation paradigm that bootstraps synergistic collaborative behaviors from abundant single-human data through decentralized training and multi-agent PPO. Finally, we develop a trajectory-conditioned generative policy using a conditional VAE, trained via multi-teacher distillation from motion imitation priors to achieve stable and controllable object-level trajectory execution. Extensive experiments demonstrate that SynAgent significantly outperforms existing baselines in both cooperative imitation and trajectory-conditioned control, while generalizing across diverse object geometries. Codes and data will be available after publication. Project Page: http://yw0208.github.io/synagent
翻译:可控的类人协作操控是具身智能领域一个基础但极具挑战性的问题,其原因在于数据严重匮乏、多智能体协调的复杂性以及跨物体的泛化能力有限。在本文中,我们提出了SynAgent,一个统一的框架,通过利用“单人至协作智能体协同”将技能从单智能体的人-物交互迁移至多智能体的人-物-人场景,从而实现了可扩展且物理上合理的协作操控。为在运动迁移过程中保持语义完整性,我们引入了一种基于交互网格的保持交互的重定向方法,该网格通过德劳内四面体化构建,能够忠实地维持人与物体之间的空间关系。基于这些精炼后的数据,我们提出了一种单智能体预训练与自适应范式,通过分散式训练和多智能体PPO算法,从丰富的单人体数据中自举出协同合作行为。最后,我们利用条件变分自编码器开发了一种轨迹条件生成策略,该策略通过从运动模仿先验中进行多教师蒸馏训练,以实现稳定且可控的物体级轨迹执行。大量实验表明,SynAgent在协作模仿和轨迹条件控制方面均显著优于现有基线方法,并能泛化至多种不同的物体几何形状。代码与数据将在论文发表后公开。项目页面:http://yw0208.github.io/synagent