Lengthy evaluation times are common in many optimization problems such as direct policy search tasks, especially when they involve conducting evaluations in the physical world, e.g. in robotics applications. Often when evaluating solution over a fixed time period it becomes clear that the objective value will not increase with additional computation time (for example when a two wheeled robot continuously spins on the spot). In such cases, it makes sense to stop the evaluation early to save computation time. However, most approaches to stop the evaluation are problem specific and need to be specifically designed for the task at hand. Therefore, we propose an early stopping method for direct policy search. The proposed method only looks at the objective value at each time step and requires no problem specific knowledge. We test the introduced stopping criterion in five direct policy search environments drawn from games, robotics and classic control domains, and show that it can save up to 75% of the computation time. We also compare it with problem specific stopping criteria and show that it performs comparably, while being more generally applicable.
翻译:在诸多优化问题(如直接策略搜索任务)中,尤其是当涉及物理世界评估(例如机器人应用)时,评估时长过长是常见挑战。当基于固定时间周期对解决方案进行评估时,往往会发现即使增加计算时间,目标值也不再提升(例如双轮机器人原地打转的情况)。此时,提前终止评估以节省计算时间便具有实际意义。然而,现有的大多数评估终止方法具有领域特异性,需要针对具体任务单独设计。为此,我们提出一种面向直接策略搜索的早停方法。该方法仅需观测每个时间步的目标值,无需依赖任何问题领域的专业知识。我们在涵盖游戏、机器人学及经典控制领域的五个直接策略搜索环境中测试了这一早停准则,实验表明可节省高达75%的计算时间。通过与领域特异性终止准则的对比发现,该方法在保持可比性能的同时,具有更广泛的适用性。