We present a novel, alternative framework for learning generative models with goal-conditioned reinforcement learning. We define two agents, a goal conditioned agent (GC-agent) and a supervised agent (S-agent). Given a user-input initial state, the GC-agent learns to reconstruct the training set. In this context, elements in the training set are the goals. During training, the S-agent learns to imitate the GC-agent while remaining agnostic of the goals. At inference we generate new samples with the S-agent. Following a similar route as in variational auto-encoders, we derive an upper bound on the negative log-likelihood that consists of a reconstruction term and a divergence between the GC-agent policy and the (goal-agnostic) S-agent policy. We empirically demonstrate that our method is able to generate diverse and high quality samples in the task of image synthesis.
翻译:我们提出了一种新颖的替代框架,通过目标条件强化学习来训练生成模型。该框架定义了两个智能体:目标条件智能体(GC-agent)和监督智能体(S-agent)。给定用户输入的初始状态,GC-agent学习重构训练集。在此背景下,训练集中的元素即为目标。在训练过程中,S-agent学习模仿GC-agent,同时保持对目标无关的独立性。在推理阶段,我们利用S-agent生成新样本。遵循与变分自编码器相似的路径,我们推导出负对数似然的上界,该上界由重构项以及GC-agent策略与(目标无关的)S-agent策略之间的散度组成。实验表明,我们的方法能够在图像合成任务中生成多样且高质量的样本。