We introduce a new benchmarking suite for high-dimensional control, targeted at testing high spatial and temporal precision, coordination, and planning, all with an underactuated system frequently making-and-breaking contacts. The proposed challenge is mastering the piano through bi-manual dexterity, using a pair of simulated anthropomorphic robot hands. We call it RoboPianist, and the initial version covers a broad set of 150 variable-difficulty songs. We investigate both model-free and model-based methods on the benchmark, characterizing their performance envelopes. We observe that while certain existing methods, when well-tuned, can achieve impressive levels of performance in certain aspects, there is significant room for improvement. RoboPianist provides a rich quantitative benchmarking environment, with human-interpretable results, high ease of expansion by simply augmenting the repertoire with new songs, and opportunities for further research, including in multi-task learning, zero-shot generalization, multimodal (sound, vision, touch) learning, and imitation. Supplementary information, including videos of our control policies, can be found at https://kzakka.com/robopianist/
翻译:我们提出了一种新的高维控制基准测试套件,旨在测试系统在欠驱动、频繁建立与断开接触的条件下,对高空间与时间精度、协调性及规划能力的综合性能。该挑战的核心是通过一对仿真拟人机器人手实现双手技巧性弹奏钢琴。我们将其命名为RoboPianist,初始版本涵盖150首难度各异的曲目。我们对该基准测试同时采用了无模型与基于模型的方法进行研究,并刻画了其性能边界。观察发现,尽管某些现有方法在精心调参后可在特定方面达到令人瞩目的性能水平,但仍有显著的改进空间。RoboPianist提供了丰富的量化基准测试环境,其结果具备人类可解释性,通过简单扩充曲目库即可轻松扩展,并为多任务学习、零样本泛化、多模态(声音、视觉、触觉)学习及模仿学习等领域提供了进一步研究的机会。补充信息(包括控制策略的演示视频)详见 https://kzakka.com/robopianist/