When manipulating objects in the real world, we need reactive feedback policies that take into account sensor information to inform decisions. This study aims to determine how different encoders can be used in a reinforcement learning (RL) framework to interpret the spatial environment in the local surroundings of a robot arm. Our investigation focuses on comparing real-world vision with 3D scene inputs, exploring new architectures in the process. We built on the SERL framework, providing us with a sample efficient and stable RL foundation we could build upon, while keeping training times minimal. The results of this study indicate that spatial information helps to significantly outperform the visual counterpart, tested on a box picking task with a vacuum gripper. The code and videos of the evaluations are available at https://github.com/nisutte/voxel-serl.
翻译:在现实世界中进行物体操作时,我们需要能够考虑传感器信息以辅助决策的实时反馈策略。本研究旨在探究如何在强化学习框架中利用不同编码器来解析机械臂局部空间环境。我们的研究重点在于对比真实世界视觉输入与三维场景输入,并在此过程中探索新的架构设计。本研究基于SERL框架构建,该框架为我们提供了样本高效且稳定的强化学习基础,同时最大限度地缩短了训练时间。研究结果表明,在真空吸盘执行箱体抓取任务的测试中,空间信息的引入能够显著超越纯视觉方案。评估代码及实验视频详见 https://github.com/nisutte/voxel-serl。