Dynamic topic models track the evolution of topics in sequential documents, which have derived various applications like trend analysis and opinion mining. However, existing models suffer from repetitive topic and unassociated topic issues, failing to reveal the evolution and hindering further applications. To address these issues, we break the tradition of simply chaining topics in existing work and propose a novel neural \modelfullname. We introduce a new evolution-tracking contrastive learning method that builds the similarity relations among dynamic topics. This not only tracks topic evolution but also maintains topic diversity, mitigating the repetitive topic issue. To avoid unassociated topics, we further present an unassociated word exclusion method that consistently excludes unassociated words from discovered topics. Extensive experiments demonstrate our model significantly outperforms state-of-the-art baselines, tracking topic evolution with high-quality topics, showing better performance on downstream tasks, and remaining robust to the hyperparameter for evolution intensities. Our code is available at https://github.com/bobxwu/CFDTM .
翻译:动态主题模型追踪时序文档中主题的演化过程,已在趋势分析、意见挖掘等多个领域得到广泛应用。然而,现有模型普遍存在主题重复与主题无关词关联的问题,这既难以有效揭示主题演化规律,也限制了模型的进一步应用。为解决这些问题,本文突破了现有工作中简单串联主题的传统思路,提出了一种新颖的神经动态主题模型。我们引入了一种新的演化追踪对比学习方法,该方法通过构建动态主题间的相似性关系,不仅能够追踪主题演化轨迹,还能保持主题的多样性,从而缓解主题重复问题。为避免生成包含无关词汇的主题,我们进一步提出了一种非关联词排除方法,能够持续地从已发现主题中剔除无关词汇。大量实验表明,本模型在多项指标上显著优于当前最先进的基线方法,能够以高质量主题追踪主题演化过程,在下游任务中表现出更优的性能,并且对演化强度超参数保持较强的鲁棒性。代码已开源:https://github.com/bobxwu/CFDTM。