When learning to walk, infants seem to address a coarse version of the problem first - stay upright, reach the caregiver - and refine it only when further practice at that resolution stops paying off. Reinforcement learning offers multiple techniques for building simple versions of complex tasks, but lacks general principles for how to dynamically adjust the granularity of these abstractions during learning. This paper proposes one such principle: refine the abstraction as soon as the learning error within it becomes comparable to the error induced by the abstraction itself. Here, we investigate one way of formalising this principle via a performance certificate that decomposes value error into two terms: a learning error bound captured by a Bellman residual, and an abstraction error bound given by a bisimulation metric. The resulting switching strategy is implemented by soft state-action abstractions built from rate-distortion principles, whose resolution along state and action axes can be continuously adjusted. We validate this construction in a range of tabular settings, showing that near-optimal performance can be achieved under substantial lossy compression of state and action information.
翻译:在学步过程中,婴儿似乎首先解决任务的粗略版本——保持直立、接近看护者——只有当在该分辨率的进一步训练不再产生收益时,才会精化这一粗略版本。强化学习提供了多种构建复杂任务简化版本的技术,但缺乏关于如何在任务学习过程中动态调整这些抽象粒度的通用原则。本文提出了一种这样的原则:一旦抽象内部的学习误差变得与抽象本身所引发的误差相当,就立即对抽象进行精化。我们通过一种性能保证机制来研究这一原则的形式化方法,该机制将价值误差分解为两项:一项是由贝尔曼残差所捕获的学习误差界,另一项是由双模拟度量所给出的抽象误差界。由此产生的切换策略通过基于率失真原理构建的软状态-动作抽象实现,这些抽象的状态和动作轴分辨率可连续调节。我们在广泛的表格环境设置中验证了该构造,结果表明在状态和动作信息被大量有损压缩的情况下,仍能实现接近最优的性能。