Reinforcement learning can learn amortised design policies for designing sequences of experiments. However, current amortised methods rely on estimators of expected information gain (EIG) that require an exponential number of samples on the magnitude of the EIG to achieve an unbiased estimation. We propose the use of an alternative estimator based on the cross-entropy of the joint model distribution and a flexible proposal distribution. This proposal distribution approximates the true posterior of the model parameters given the experimental history and the design policy. Our method overcomes the exponential-sample complexity of previous approaches and provide more accurate estimates of high EIG values. More importantly, it allows learning of superior design policies, and is compatible with continuous and discrete design spaces, non-differentiable likelihoods and even implicit probabilistic models.
翻译:强化学习能够学习用于设计实验序列的摊销设计策略。然而,当前的摊销方法依赖于期望信息增益(EIG)的估计器,这些估计器需要与EIG量级呈指数关系的样本数量才能实现无偏估计。我们提出采用基于联合模型分布与灵活提议分布的交叉熵的替代估计器。该提议分布近似于给定实验历史与设计策略下模型参数的真实后验分布。我们的方法克服了先前方法在样本复杂性方面的指数增长问题,并能提供高EIG值的更准确估计。更重要的是,它可以学习更优的设计策略,并且兼容连续与离散设计空间、不可微似然函数甚至隐式概率模型。