Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework

Despite the promising results achieved, state-of-the-art interactive reinforcement learning schemes rely on passively receiving supervision signals from advisor experts, in the form of either continuous monitoring or pre-defined rules, which inevitably result in a cumbersome and expensive learning process. In this paper, we introduce a novel initiative advisor-in-the-loop actor-critic framework, termed as Ask-AC, that replaces the unilateral advisor-guidance mechanism with a bidirectional learner-initiative one, and thereby enables a customized and efficacious message exchange between learner and advisor. At the heart of Ask-AC are two complementary components, namely action requester and adaptive state selector, that can be readily incorporated into various discrete actor-critic architectures. The former component allows the agent to initiatively seek advisor intervention in the presence of uncertain states, while the latter identifies the unstable states potentially missed by the former especially when environment changes, and then learns to promote the ask action on such states. Experimental results on both stationary and non-stationary environments and across different actor-critic backbones demonstrate that the proposed framework significantly improves the learning efficiency of the agent, and achieves the performances on par with those obtained by continuous advisor monitoring.

翻译：尽管已有前景可观的结果，当前最先进的交互式强化学习方案依赖于被动接收顾问专家的监督信号——无论是通过持续监控还是预设规则，这不可避免地导致学习过程繁琐且成本高昂。本文提出了一种新颖的主动式顾问在环演员-评论家框架，命名为Ask-AC，该框架以双向学习者主动机制取代单边顾问引导机制，从而实现学习器与顾问之间定制化且高效的信息交换。Ask-AC的核心包含两个互补组件：动作请求器和自适应状态选择器，二者可便捷地集成到各类离散动作空间的演员-评论家架构中。前一组件允许智能体在遇到不确定状态时主动寻求顾问干预，而后一组件则能识别前者可能遗漏的不稳定状态（尤其在环境变化时），并学习在此类状态下主动触发请求动作。在静态与非静态环境及不同演员-评论家骨干网络上的实验结果表明，所提出的框架显著提升了智能体的学习效率，并达到了与持续性顾问监控相当的性能水平。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日