A combination of deep reinforcement learning and supervised learning is proposed for the problem of active sequential hypothesis testing in completely unknown environments. We make no assumptions about the prior probability, the action and observation sets, and the observation generating process. Our method can be used in any environment even if it has continuous observations or actions, and performs competitively and sometimes better than the Chernoff test, in both finite and infinite horizon problems, despite not having access to the environment dynamics.
翻译:针对完全未知环境中的主动序贯假设检验问题,提出了一种深度融合强化学习与监督学习的方法。该方法无需对先验概率、动作与观测集合及观测生成过程做出任何假设。即便在环境具有连续观测或连续动作的情况下,本方法仍可适用。在有限时域与无限时域问题中,尽管无法获取环境动态信息,该方法仍能达到与切尔诺夫检验相匹敌甚至更优的表现。