We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning. SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond a generative model. We empirically test SYMBOL in four large POMDP benchmark problems to demonstrate its effectiveness and robustness w.r.t. the choice of hyperparameters and evaluate its adaptive memory consumption. We also compare its performance with other open-loop planning algorithms and POMCP.
翻译:我们提出稳定但内存受限的开环规划(SYMBOL),一种面向部分可观测开环规划的通用内存受限方法。SYMBOL维护一个自适应堆叠的Thompson Sampling bandits,其规模受规划时域限制,且可在无任何先验领域知识(除生成模型外)的情况下根据底层领域自动调整。我们通过四个大规模POMDP基准问题对SYMBOL进行实证测试,证明其相对于超参数选择的有效性与鲁棒性,并评估其自适应内存消耗。此外,我们还将其性能与其他开环规划算法及POMCP进行了比较。