Bimanual robot systems substantially expand manipulation capabilities, but coordinating two arms introduces additional control complexity and failure modes that are not well captured by existing benchmarks. We introduce DuoBench, an extensible benchmarking framework for bimanual manipulation policies on the FR3 Duo platform. DuoBench comprises eleven tasks spanning four coordination categories, implemented in simulation and partially reproduced in the real world through reproducible task recipes with 3D-printable assets. In addition, we propose a stage-based evaluation scheme that supports fine-grained semantic failure analysis beyond binary success and provide human-teleoperated datasets for all benchmark tasks. We benchmark several dual-arm imitation-learning and vision-language-action policies in simulation and on real hardware. Our results show that current policies remain challenged by bimanual manipulation, particularly in early interaction stages, parallel arm execution, and transfer between simulation and real-world settings. DuoBench provides a reproducible testbed for diagnosing these failure modes and studying future methods for dual-arm policy learning. Code, datasets, and videos are available at https://duobench.github.io/
翻译:双臂机器人系统显著扩展了操作能力,但双臂协调会引入现有基准未充分捕捉的额外控制复杂度与故障模式。我们提出DuoBench——一个基于FR3 Duo平台的可扩展双臂操作策略基准框架。该框架包含覆盖四类协调任务的11项任务,在仿真环境中实现,并通过包含3D打印资产的可复现任务方案在真实世界中部分复现。此外,我们提出一种分阶段评估方案,支持超越二值成功率的细粒度语义故障分析,并为所有基准任务提供人类遥操作数据集。我们在仿真环境和真实硬件上对多种双臂模仿学习与视觉-语言-动作策略进行了基准测试。结果表明,现有策略在双臂操作中仍面临挑战,尤其在早期交互阶段、双臂并行执行及仿真-真实场景迁移方面。DuoBench为诊断这些故障模式及研究未来双臂策略学习方法提供了可复现的试验平台。代码、数据集与视频详见https://duobench.github.io/