Visual Dexterity: In-Hand Reorientation of Novel and Complex Object Shapes

In-hand object reorientation is necessary for performing many dexterous manipulation tasks, such as tool use in less structured environments that remain beyond the reach of current robots. Prior works built reorientation systems assuming one or many of the following: reorienting only specific objects with simple shapes, limited range of reorientation, slow or quasistatic manipulation, simulation-only results, the need for specialized and costly sensor suites, and other constraints which make the system infeasible for real-world deployment. We present a general object reorientation controller that does not make these assumptions. It uses readings from a single commodity depth camera to dynamically reorient complex and new object shapes by any rotation in real-time, with the median reorientation time being close to seven seconds. The controller is trained using reinforcement learning in simulation and evaluated in the real world on new object shapes not used for training, including the most challenging scenario of reorienting objects held in the air by a downward-facing hand that must counteract gravity during reorientation. Our hardware platform only uses open-source components that cost less than five thousand dollars. Although we demonstrate the ability to overcome assumptions in prior work, there is ample scope for improving absolute performance. For instance, the challenging duck-shaped object not used for training was dropped in 56 percent of the trials. When it was not dropped, our controller reoriented the object within 0.4 radians (23 degrees) 75 percent of the time. Videos are available at: https://taochenshh.github.io/projects/visual-dexterity.

翻译：掌心物体再定向是执行许多灵巧操作任务（例如在结构化程度较低的环境中使用工具）的必要能力，而这仍超出当前机器人的能力范围。先前研究构建的再定向系统通常基于一个或多个假设：仅对简单形状的特定物体进行再定向、再定向范围有限、缓慢或准静态操作、仅限仿真结果、需要专用且昂贵的传感器套件以及其他限制，导致系统无法在实际场景中部署。本文提出一种通用的物体再定向控制器，不依赖上述任何假设。该控制器利用单个商用深度相机的读数，实时动态地将复杂新颖的物体形状旋转至任意方向，中位再定向时间接近七秒。控制器通过仿真中的强化学习训练，并在真实世界中用未参与训练的新物体形状进行评估，包括最具挑战性的场景：由朝下的手掌在空中握住物体，在再定向过程中需对抗重力。我们的硬件平台仅使用成本低于五千美元的开源组件。尽管我们证明了突破先前研究假设的能力，但在绝对性能上仍有改进空间。例如，未用于训练的高难度鸭子形状物体在56%的试验中被掉落。当未被掉落时，控制器在75%的情况下能将物体再定向至0.4弧度（23度）内。视频见：https://taochenshh.github.io/projects/visual-dexterity。