In a sequential decision-making problem, the information structure is the description of how events in the system occurring at different points in time affect each other. Classical models of reinforcement learning (e.g., MDPs, POMDPs) assume a simple and highly regular information structure, while more general models like predictive state representations do not explicitly model the information structure. By contrast, real-world sequential decision-making problems typically involve a complex and time-varying interdependence of system variables, requiring a rich and flexible representation of information structure. In this paper, we formalize a novel reinforcement learning model which explicitly represents the information structure. We then use this model to carry out an information-structural analysis of the statistical hardness of general sequential decision-making problems, obtaining a characterization via a graph-theoretic quantity of the DAG representation of the information structure. We prove an upper bound on the sample complexity of learning a general sequential decision-making problem in terms of its information structure by exhibiting an algorithm achieving the upper bound. This recovers known tractability results and gives a novel perspective on reinforcement learning in general sequential decision-making problems, providing a systematic way of identifying new tractable classes of problems.
翻译:在序列决策问题中,信息结构描述了系统中不同时间点发生的事件如何相互影响。经典的强化学习模型(如MDPs、POMDPs)假设了简单且高度规则的信息结构,而更一般的模型(如预测状态表示)并未显式建模信息结构。相比之下,现实世界的序列决策问题通常涉及系统变量间复杂且时变的相互依赖关系,需要丰富而灵活的信息结构表示。本文形式化了一种显式表示信息结构的新型强化学习模型,并利用该模型对一般序列决策问题的统计难度进行信息结构分析,通过信息结构有向无环图表示的图论量获得其表征。我们通过构造达到上界的算法,证明了学习一般序列决策问题所需样本复杂度上界与其信息结构的关系。该结果恢复了已知的可处理性结论,为一般序列决策问题的强化学习提供了新视角,并为系统化识别新型可处理问题类别提供了方法。