Complex robot navigation and control problems can be framed as policy search problems. However, interactive learning in uncertain environments can be expensive, requiring the use of data-efficient methods. Bayesian optimization is an efficient nonlinear optimization method where queries are carefully selected to gather information about the optimum location. This is achieved by a surrogate model, which encodes past information, and the acquisition function for query selection. Bayesian optimization can be very sensitive to uncertainty in the input data or prior assumptions. In this work, we incorporate both robust optimization and statistical robustness, showing that both types of robustness are synergistic. For robust optimization we use an improved version of unscented Bayesian optimization which provides safe and repeatable policies in the presence of policy uncertainty. We also provide new theoretical insights. For statistical robustness, we use an adaptive surrogate model and we introduce the Boltzmann selection as a stochastic acquisition method to have convergence guarantees and improved performance even with surrogate modeling errors. We present results in several optimization benchmarks and robot tasks.
翻译:复杂机器人导航与控制问题可被构建为策略搜索问题。然而,在不确定环境中进行交互式学习成本高昂,需要采用数据高效的方法。贝叶斯优化是一种高效的非线性优化方法,其通过精心选择查询点来收集关于最优位置的信息。这通过编码历史信息的代理模型和用于查询选择的采集函数实现。贝叶斯优化对输入数据或先验假设中的不确定性极为敏感。在本研究中,我们同时融入了鲁棒优化与统计鲁棒性,证明这两种鲁棒性具有协同效应。对于鲁棒优化,我们采用改进版的无迹贝叶斯优化方法,在策略存在不确定性的情况下提供安全且可重复的策略。我们还提出了新的理论见解。对于统计鲁棒性,我们采用自适应代理模型,并引入玻尔兹曼选择作为随机采集方法,即使在代理模型存在误差时也能保证收敛性并提升性能。我们在多个优化基准测试和机器人任务中展示了实验结果。