Planning agents are ill-equipped to act in novel situations in which their domain model no longer accurately represents the world. We introduce an approach for such agents operating in open worlds that detects the presence of novelties and effectively adapts their domain models and consequent action selection. It uses observations of action execution and measures their divergence from what is expected, according to the environment model, to infer existence of a novelty. Then, it revises the model through a heuristics-guided search over model changes. We report empirical evaluations on the CartPole problem, a standard Reinforcement Learning (RL) benchmark. The results show that our approach can deal with a class of novelties very quickly and in an interpretable fashion.
翻译:规划智能体在其领域模型无法准确表征世界的新情境中难以有效行动。本文提出一种方法,使此类在开放世界中运作的智能体能够检测新异性的存在,并有效调整其领域模型及后续动作选择策略。该方法通过观测动作执行过程,衡量其与环境模型预期结果之间的偏差,从而推断新异性是否存在。随后,通过基于启发式的模型变更空间搜索来修订领域模型。我们在标准强化学习基准问题CartPole上进行了实证评估,结果表明该方法能够快速、可解释地处理特定类别的新异性现象。