Model-based planners for partially observable problems must accommodate both model uncertainty during planning and goal uncertainty during objective inference. However, model-based planners may be brittle under these types of uncertainty because they rely on an exact model and tend to commit to a single optimal behavior. Inspired by results in the model-free setting, we propose an entropy-regularized model-based planner for partially observable problems. Entropy regularization promotes policy robustness for planning and objective inference by encouraging policies to be no more committed to a single action than necessary. We evaluate the robustness and objective inference performance of entropy-regularized policies in three problem domains. Our results show that entropy-regularized policies outperform non-entropy-regularized baselines in terms of higher expected returns under modeling errors and higher accuracy during objective inference.
翻译:针对部分可观测问题的基于模型的规划器必须同时顾及规划过程中的模型不确定性和目标推断过程中的目标不确定性。然而,由于这类规划器依赖精确模型且倾向于选择单一最优行为,在面对这些不确定性时可能表现得不够稳健。受无模型方法研究成果的启发,我们提出了一种面向部分可观测问题的熵正则化基于模型规划器。熵正则化通过鼓励策略在行动选择上保持必要限度内的灵活度,增强了规划与目标推断的鲁棒性。我们在三个问题域中评估了熵正则化策略的鲁棒性与目标推断性能。结果显示,与未引入熵正则化的基线方法相比,熵正则化策略在模型误差下获得了更高的期望回报,并在目标推断过程中实现了更高的准确率。