Robots need to estimate the material and dynamic properties of objects from observations in order to simulate them accurately. We present a Bayesian optimization approach to identifying the material property parameters of objects based on a set of observations. Our focus is on estimating these properties based on observations of scenes with different sets of interacting objects. We propose an approach that exploits the structure of the reward function by modeling the reward for each observation separately and using only the parameters of the objects in that scene as inputs. The resulting lower-dimensional models generalize better over the parameter space, which in turn results in a faster optimization. To speed up the optimization process further, and reduce the number of simulation runs needed to find good parameter values, we also propose partial evaluations of the reward function, wherein the selected parameters are only evaluated on a subset of real world evaluations. The approach was successfully evaluated on a set of scenes with a wide range of object interactions, and we showed that our method can effectively perform incremental learning without resetting the rewards of the gathered observations.
翻译:机器人需要从观测中估计物体的材料与动力学属性,以对其进行精确仿真。我们提出了一种基于贝叶斯优化的方法,通过一组观测数据来识别物体的材料属性参数。研究重点在于根据包含不同交互物体集合的场景观测来估计这些属性。我们提出了一种利用奖励函数结构的方法:对每次观测分别建立奖励模型,仅以该场景中物体的参数作为输入。这种低维模型在参数空间上具有更强的泛化能力,从而加速优化过程。为进一步提升优化速度并减少寻找最优参数所需的仿真运行次数,我们还提出了奖励函数的部分评估策略,即仅对真实世界评估子集计算选定参数对应的奖励。该方法在一系列包含丰富物体交互的场景中得到了成功验证,实验表明我们的方法能够在不重置已收集观测奖励的情况下,高效执行增量学习。