Minigolf, a game with countless court layouts, and complex ball motion, constitutes a compelling real-world testbed for the study of embodied intelligence. As it not only challenges spatial and kinodynamic reasoning but also requires reflective and corrective capacities to address erroneously designed courses. We introduce RoboGolf, a VLM-based framework that perceives dual-camera visual inputs with nested VLM-empowered closed-loop control and reflective equilibrium loop. Extensive experiments demonstrate the effectiveness of RoboGolf on challenging minigolf courts including those that are impossible to finish.
翻译:迷你高尔夫作为一种具有无数场地布局和复杂球体运动的游戏,为具身智能研究提供了一个极具吸引力的真实世界测试平台。这不仅挑战了空间与运动动力学推理能力,还要求系统具备反思与修正能力以应对错误设计的球道。我们提出了RoboGolf,一个基于视觉语言模型(VLM)的框架,它通过嵌套式VLM赋能的闭环控制与反射均衡循环来感知双摄像头视觉输入。大量实验证明了RoboGolf在具有挑战性的迷你高尔夫球场上(包括那些无法完成的球场)的有效性。