The ability to learn new tasks and quickly adapt to different variations or dimensions is an important attribute in agile robotics. In our previous work, we have explored Behavior Trees and Motion Generators (BTMGs) as a robot arm policy representation to facilitate the learning and execution of assembly tasks. The current implementation of the BTMGs for a specific task may not be robust to the changes in the environment and may not generalize well to different variations of tasks. We propose to extend the BTMG policy representation with a module that predicts BTMG parameters for a new task variation. To achieve this, we propose a model that combines a Gaussian process and a weighted support vector machine classifier. This model predicts the performance measure and the feasibility of the predicted policy with BTMG parameters and task variations as inputs. Using the outputs of the model, we then construct a surrogate reward function that is utilized within an optimizer to maximize the performance of a task over BTMG parameters for a fixed task variation. To demonstrate the effectiveness of our proposed approach, we conducted experimental evaluations on push and obstacle avoidance tasks in simulation and with a real KUKA iiwa robot. Furthermore, we compared the performance of our approach with four baseline methods.
翻译:在敏捷机器人领域中,学习新任务并快速适应不同变化或维度的能力是一项重要属性。在先前工作中,我们探索了将行为树与运动生成器(BTMGs)作为机械臂策略表示方法,以促进装配任务的学习与执行。针对特定任务实现的现有BTMGs对环境变化的鲁棒性不足,且难以泛化至不同任务变体。我们提出通过增加一个预测新任务变体下BTMGs参数的模块来扩展BTMG策略表示。为实现该目标,我们构建了一种结合高斯过程与加权支持向量机分类器的模型。该模型以BTMG参数和任务变化作为输入,预测策略的性能指标与可行性。基于模型输出,我们进一步构建代理奖励函数,将其嵌入优化器中,以在固定任务变化条件下最大化BTMG参数对应的任务性能。为验证所提方法的有效性,我们在仿真环境及真实KUKA iiwa机器人上对推搡和避障任务进行了实验评估,并将该方法与四种基线方法进行了性能对比。