We propose KnotGym, an interactive environment for complex, spatial reasoning and manipulation. KnotGym includes goal-oriented rope manipulation tasks with varying levels of complexity, all requiring acting from pure image observations. Tasks are defined along a clear and quantifiable axis of complexity based on the number of knot crossings, creating a natural generalization test. KnotGym has a simple observation space, allowing for scalable development, yet it highlights core challenges in integrating acute perception, spatial reasoning, and grounded manipulation. We evaluate methods of different classes, including model-based RL, model-predictive control, and chain-of-thought reasoning, and illustrate the challenges KnotGym presents. KnotGym is available at https://github.com/lil-lab/knotgym.
翻译:我们提出了KnotGym,这是一个用于复杂空间推理与操作的交互式环境。KnotGym包含一系列目标导向的绳索操作任务,这些任务具有不同的复杂度,均要求仅基于纯图像观测进行动作执行。任务根据绳结交叉点的数量,沿着一条清晰且可量化的复杂度轴线进行定义,从而构成了一个自然的泛化测试。KnotGym具有简单的观测空间,有利于可扩展的开发,同时它突显了整合敏锐感知、空间推理与具身操作的核心挑战。我们评估了不同类别的方法,包括基于模型的强化学习、模型预测控制以及思维链推理,并阐明了KnotGym所带来的挑战。KnotGym可在 https://github.com/lil-lab/knotgym 获取。