This paper presents a novel method for learning reward functions for robotic motions by harnessing the power of a CLIP-based model. Traditional reward function design often hinges on manual feature engineering, which can struggle to generalize across an array of tasks. Our approach circumvents this challenge by capitalizing on CLIP's capability to process both state features and image inputs effectively. Given a pair of consecutive observations, our model excels in identifying the motion executed between them. We showcase results spanning various robotic activities, such as directing a gripper to a designated target and adjusting the position of a cube. Through experimental evaluations, we underline the proficiency of our method in precisely deducing motion and its promise to enhance reinforcement learning training in the realm of robotics.
翻译:本文提出了一种新颖的方法,通过利用基于CLIP模型的能力来学习机器人动作的奖励函数。传统的奖励函数设计通常依赖于手动特征工程,这难以在多种任务中泛化。我们的方法通过利用CLIP有效处理状态特征和图像输入的能力,绕过了这一挑战。给定一对连续观测,我们的模型能出色地识别它们之间执行的动作。我们展示了涵盖多种机器人活动的结果,例如将夹爪引导至指定目标以及调整立方体的位置。通过实验评估,我们强调了我们的方法在精确推断动作方面的能力,以及其在增强机器人领域的强化学习训练方面的潜力。