An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL.
翻译:优秀的表征对于强化学习(RL)性能至关重要,尤其是在基于视觉的强化学习任务中。环境表征的质量直接影响学习任务的达成。以往的基于视觉的RL通常使用显式或隐式的方法来表征环境,例如图像、点云、体素和神经辐射场。然而,这些表征存在若干缺陷。它们要么无法描述复杂的局部几何结构,要么对未见场景泛化能力不足,或者需要精确的前景掩码。此外,这些隐式神经表征类似于一个“黑盒”,严重阻碍了可解释性。3D高斯溅射(3DGS)凭借其显式的场景表征和可微分的渲染特性,被认为是重建与表征方法的一次革命性变革。在本文中,我们提出了一种新颖的泛化高斯溅射框架作为RL任务的表征,称为GSRL。通过在RoboMimic环境中的验证,我们的方法在多项任务中取得了优于其他基线模型的结果,在最困难的任务上相比基线模型性能分别提升了10%、44%和15%。本工作是首次尝试利用可泛化的3DGS作为RL的表征。