Researchers have formalized reinforcement learning (RL) in different ways. If an agent in one RL framework is to run within another RL framework's environments, the agent must first be converted, or mapped, into that other framework. Whether or not this is possible depends on not only the RL frameworks in question and but also how intelligence itself is measured. In this paper, we lay foundations for studying relative-intelligence-preserving mappability between RL frameworks. We define two types of mappings, called weak and strong translations, between RL frameworks and prove that existence of these mappings enables two types of intelligence comparison according to the mappings preserving relative intelligence. We investigate the existence or lack thereof of these mappings between: (i) RL frameworks where agents go first and RL frameworks where environments go first; and (ii) twelve different RL frameworks differing in terms of whether or not agents or environments are required to be deterministic. In the former case, we consider various natural mappings between agent-first and environment-first RL and vice versa; we show some positive results (some such mappings are strong or weak translations) and some negative results (some such mappings are not). In the latter case, we completely characterize which of the twelve RL-framework pairs admit weak translations, under the assumption of integer-valued rewards and some additional mild assumptions.
翻译:研究者们以不同方式形式化了强化学习。若一个框架中的智能体需在另一框架的交互环境中运行,则必须先将该智能体转换(或映射)至目标框架。这种映射的可行性不仅取决于所涉强化学习框架本身,还取决于智能本身的度量方式。本文为研究强化学习框架间保持相对智能的可映射性奠定基础。我们定义了两类映射——弱翻译与强翻译,并证明这些映射的存在性可依据其保持相对智能的特性实现两种智能比较。我们探究以下情形中这些映射的存在性或缺失:(i)智能体优先的强化学习框架与环境优先的强化学习框架之间;以及(ii)根据智能体或交互环境是否要求确定性而区分的十二种不同强化学习框架。在前一情形中,我们考察了智能体优先与环境优先强化学习框架间(及反向)的各种自然映射,展示了某些映射构成强翻译或弱翻译的正面结果,以及某些映射不构成翻译的负面结果。在后一情形中,我们在整数奖励假设及若干温和附加条件下,完整刻画了十二种强化学习框架对中哪些允许弱翻译存在。