Supervised visuomotor policies have shown strong performance in robotic manipulation but often struggle in tasks with limited visual inputs, such as operations in confined spaces and dimly lit environments, or tasks requiring precise perception of object properties and environmental interactions. In such cases, tactile feedback becomes essential for manipulation. While the rapid progress of supervised visuomotor policies has benefited greatly from high-quality, reproducible simulation benchmarks in visual imitation, the visuotactile domain still lacks a similarly comprehensive and reliable benchmark for large-scale and rigorous evaluation. To address this, we introduce ManiFeel, a reproducible and scalable simulation benchmark designed to systematically study supervised visuotactile policy learning. ManiFeel offers a diverse suite of contact-rich and visually challenging manipulation tasks, a modular evaluation pipeline spanning sensing modalities, tactile representations, and policy architectures, as well as real-world validation. Through extensive experiments, ManiFeel demonstrates how tactile sensing enhances policy performance across diverse manipulation scenarios, ranging from precise contact-driven operations to visually constrained settings. In addition, the results reveal task-dependent strengths of different tactile modalities and identify key design principles and open challenges for robust visuotactile policy learning. Real-world evaluations further confirm that ManiFeel provides a reliable and meaningful foundation for benchmarking and future visuotactile policy development. To foster reproducibility and future research, we will release our codebase, datasets, training logs, and pretrained checkpoints, aiming to accelerate progress toward generalizable visuotactile policy learning and manipulation.
翻译:监督式视觉运动策略在机器人操作任务中展现出强大性能,但在视觉输入受限的场景中往往表现不佳,例如在狭小空间、昏暗环境下的操作,或需要精确感知物体属性与环境交互的任务。在此类情况下,触觉反馈对于操作至关重要。尽管监督式视觉运动策略的快速发展在很大程度上得益于视觉模仿领域高质量、可复现的仿真基准,但视觉触觉领域仍缺乏类似全面且可靠的大规模严格评估基准。为此,我们提出了ManiFeel,一个可复现且可扩展的仿真基准,旨在系统研究监督式视觉触觉策略学习。ManiFeel提供了一系列多样化的接触密集且视觉挑战性的操作任务、一个涵盖感知模态、触觉表征和策略架构的模块化评估流程,以及真实世界验证。通过大量实验,ManiFeel展示了触觉感知如何在不同操作场景中提升策略性能,范围从精确的接触驱动操作到视觉受限环境。此外,实验结果揭示了不同触觉模态的任务依赖性优势,并指出了鲁棒视觉触觉策略学习的关键设计原则与开放挑战。真实世界评估进一步证实,ManiFeel为基准测试和未来视觉触觉策略开发提供了可靠且有意义的基石。为促进可复现性与未来研究,我们将公开代码库、数据集、训练日志和预训练检查点,旨在加速通用化视觉触觉策略学习与操作研究的进展。