Lengthy evaluation times are common in many optimization problems such as direct policy search tasks, especially when they involve conducting evaluations in the physical world, e.g. in robotics applications. Often, when evaluating a solution over a fixed time period, it becomes clear that the objective value will not increase with additional computation time (for example, when a two-wheeled robot continuously spins on the spot). In such cases, it makes sense to stop the evaluation early to save computation time. However, most approaches to stop the evaluation are problem-specific and need to be specifically designed for the task at hand. Therefore, we propose an early stopping method for direct policy search. The proposed method only looks at the objective value at each time step and requires no problem-specific knowledge. We test the introduced stopping criterion in five direct policy search environments drawn from games, robotics, and classic control domains, and show that it can save up to 75% of the computation time. We also compare it with problem-specific stopping criteria and demonstrate that it performs comparably while being more generally applicable.
翻译:长时间评估在许多优化问题中普遍存在,例如直接策略搜索任务,尤其是涉及在物理世界中进行评估时(如机器人应用)。当在固定时间周期内评估某个解时,往往会发现即使增加计算时间,目标值也不会进一步提升(例如,当双轮机器人持续原地旋转时)。在这种情况下,提前终止评估以节省计算时间是合理的。然而,大多数评估终止方法都是针对特定问题的,需要为具体任务专门设计。因此,我们提出了一种适用于直接策略搜索的早停方法。该方法仅需观察每个时间步的目标值,无需任何问题特定知识。我们在源自游戏、机器人和经典控制领域的五个直接策略搜索环境中测试了所提出的停止准则,结果表明该方法可节省高达75%的计算时间。我们还将其与问题特定的停止准则进行比较,证明该方法在保持可比性能的同时具有更强的通用性。