In robotics, likelihood-free inference (LFI) can provide the domain distribution that adapts a learnt agent in a parametric set of deployment conditions. LFI assumes an arbitrary support for sampling, which remains constant as the initial generic prior is iteratively refined to more descriptive posteriors. However, a potentially misspecified support can lead to suboptimal, yet falsely certain, posteriors. To address this issue, we propose three heuristic LFI variants: EDGE, MODE, and CENTRE. Each interprets the posterior mode shift over inference steps in its own way and, when integrated into an LFI step, adapts the support alongside posterior inference. We first expose the support misspecification issue and evaluate our heuristics using stochastic dynamical benchmarks. We then evaluate the impact of heuristic support adaptation on parameter inference and policy learning for a dynamic deformable linear object (DLO) manipulation task. Inference results in a finer length and stiffness classification for a parametric set of DLOs. When the resulting posteriors are used as domain distributions for sim-based policy learning, they lead to more robust object-centric agent performance.
翻译:在机器人学中,似然自由推断(LFI)能够提供适应学习智能体在一组参数化部署条件下的域分布。LFI假设采样支撑集是任意的,且在迭代优化从初始通用先验到更具描述性的后验过程中保持不变。然而,潜在错误指定的支撑集可能导致次优但虚假确定的后验。为解决此问题,我们提出了三种启发式LFI变体:EDGE、MODE和CENTRE。每种变体以不同方式解释推断步骤中后验众数的偏移,并在集成到LFI步骤时,随同后验推断自适应调整支撑集。我们首先揭示了支撑集错误指定问题,并使用随机动力学基准评估了我们的启发式方法。随后,我们评估了启发式支撑自适应对动态可变形线性物体(DLO)操控任务的参数推断与策略学习的影响。推断结果实现了对参数化DLO集合更精细的长度与刚度分类。当将所得后验用作基于仿真的策略学习的域分布时,它们能带来更鲁棒的以物体为中心的智能体性能。