Recently, Neural Topic Models (NTM), inspired by variational autoencoders, have attracted a lot of research interest; however, these methods have limited applications in the real world due to the challenge of incorporating human knowledge. This work presents a semi-supervised neural topic modeling method, vONTSS, which uses von Mises-Fisher (vMF) based variational autoencoders and optimal transport. When a few keywords per topic are provided, vONTSS in the semi-supervised setting generates potential topics and optimizes topic-keyword quality and topic classification. Experiments show that vONTSS outperforms existing semi-supervised topic modeling methods in classification accuracy and diversity. vONTSS also supports unsupervised topic modeling. Quantitative and qualitative experiments show that vONTSS in the unsupervised setting outperforms recent NTMs on multiple aspects: vONTSS discovers highly clustered and coherent topics on benchmark datasets. It is also much faster than the state-of-the-art weakly supervised text classification method while achieving similar classification performance. We further prove the equivalence of optimal transport loss and cross-entropy loss at the global minimum.
翻译:近期,受变分自编码器启发的神经主题模型(NTM)引起了广泛研究兴趣;然而,由于难以融入人类知识,这些方法在实际应用中受到限制。本文提出一种半监督神经主题建模方法vONTSS,该方法采用基于von Mises-Fisher(vMF)分布的变分自编码器与最优传输技术。当为每个主题提供少量关键词时,vONTSS在半监督设置下生成潜在主题,并优化主题-关键词质量与主题分类性能。实验表明,vONTSS在分类准确性和多样性方面优于现有半监督主题建模方法。vONTSS同样支持无监督主题建模。定量与定性实验证实,vONTSS在无监督设置下多方面优于近期NTM:它在基准数据集上能发现高度聚类且连贯的主题;同时,其运行速度远快于最先进的弱监督文本分类方法,且能达到相近的分类性能。我们进一步证明了最优传输损失与交叉熵损失在全局最小值处的等价性。