In general reinforcement learning, all established optimal agents, including AIXI, are model-based, explicitly maintaining and using environment models. This paper introduces Universal AI with Q-Induction (AIQI), the first model-free agent proven to be asymptotically $\varepsilon$-optimal in general RL. AIQI performs universal induction over distributional action-value functions, instead of policies or environments like previous works. Under a grain of truth condition, we prove that AIQI is strong asymptotically $\varepsilon$-optimal and asymptotically $\varepsilon$-Bayes-optimal. Our results significantly expand the diversity of known universal agents.
翻译:在通用强化学习中,所有已确立的最优智能体(包括AIXI)都是基于模型的,它们显式地维护并使用环境模型。本文介绍了基于Q归纳的通用人工智能(AIQI),这是首个被证明在通用强化学习中具有渐近$\varepsilon$最优性的无模型智能体。AIQI对分布式的动作-价值函数进行通用归纳,而非像先前工作那样对策略或环境进行归纳。在"真相颗粒"条件下,我们证明了AIQI具有强渐近$\varepsilon$最优性以及渐近$\varepsilon$贝叶斯最优性。我们的结果显著扩展了已知通用智能体的多样性。