Topic modeling is a dominant method for exploring document collections on the web and in digital libraries. Recent approaches to topic modeling use pretrained contextualized language models and variational autoencoders. However, large neural topic models have a considerable memory footprint. In this paper, we propose a knowledge distillation framework to compress a contextualized topic model without loss in topic quality. In particular, the proposed distillation objective is to minimize the cross-entropy of the soft labels produced by the teacher and the student models, as well as to minimize the squared 2-Wasserstein distance between the latent distributions learned by the two models. Experiments on two publicly available datasets show that the student trained with knowledge distillation achieves topic coherence much higher than that of the original student model, and even surpasses the teacher while containing far fewer parameters than the teacher's. The distilled model also outperforms several other competitive topic models on topic coherence.
翻译:主题建模是探索网络和数字图书馆中文档集合的主要方法。当前的主题建模方法采用预训练的上下文语言模型和变分自编码器。然而,大型神经主题模型具有相当大的内存占用。本文提出一种知识蒸馏框架,用于压缩上下文主题模型而不损失主题质量。具体而言,所提出的蒸馏目标是最小化教师模型与学生模型产生的软标签的交叉熵,同时最小化两个模型学习的潜在分布之间的平方2-Wasserstein距离。在两个公开数据集上的实验表明,通过知识蒸馏训练的学生模型在主题一致性上显著优于原始学生模型,甚至超过教师模型,而其参数数量远少于教师模型。蒸馏模型在主题一致性上也优于其他几种竞争性主题模型。