The field of Reinforcement Learning (RL) is concerned with algorithms for learning optimal policies in unknown stochastic environments. Programmatic RL studies representations of policies as programs, meaning involving higher order constructs such as control loops. Despite attracting a lot of attention at the intersection of the machine learning and formal methods communities, very little is known on the theoretical front about programmatic RL: what are good classes of programmatic policies? How large are optimal programmatic policies? How can we learn them? The goal of this paper is to give first answers to these questions, initiating a theoretical study of programmatic RL.
翻译:强化学习(RL)领域关注的是在未知随机环境中学习最优策略的算法。程序化RL研究将策略表示为程序——即涉及控制循环等高阶结构的方式。尽管这一方向在机器学习和形式化方法交叉领域引起了广泛关注,但关于程序化RL的理论基础却所知甚少:程序化策略的优良类别有哪些?最优程序化策略的规模有多大?如何学习这些策略?本文旨在首次回答这些问题,从而开启程序化RL的理论研究。