We apply reinforcement learning techniques to topic modeling by replacing the variational autoencoder in ProdLDA with a continuous action space reinforcement learning policy. We train the system with a policy gradient algorithm REINFORCE. Additionally, we introduced several modifications: modernize the neural network architecture, weight the ELBO loss, use contextual embeddings, and monitor the learning process via computing topic diversity and coherence for each training step. Experiments are performed on 11 data sets. Our unsupervised model outperforms all other unsupervised models and performs on par with or better than most models using supervised labeling. Our model is outperformed on certain data sets by a model using supervised labeling and contrastive learning. We have also conducted an ablation study to provide empirical evidence of performance improvements from changes we made to ProdLDA and found that the reinforcement learning formulation boosts performance.
翻译:我们将强化学习技术应用于主题建模,通过用连续动作空间强化学习策略替换ProdLDA中的变分自编码器。我们使用策略梯度算法REINFORCE训练该系统。此外,我们引入了若干改进:更新神经网络架构、对ELBO损失进行加权、使用上下文嵌入,并通过计算每个训练步骤的主题多样性和连贯性来监控学习过程。我们在11个数据集上进行了实验。我们的无监督模型优于所有其他无监督模型,其性能与大多数使用监督标记的模型相当或更优。在某些数据集上,我们的模型被一个使用监督标记和对比学习的模型所超越。我们还进行了消融研究,以提供实证证据证明我们对ProdLDA所做改进带来的性能提升,并发现强化学习公式化进一步提升了性能。