Dynamic topic models have been proposed as a tool for historical analysis, but traditional approaches have had limited usefulness, being difficult to configure, interpret, and evaluate. In this work, we experiment with a recent approach for dynamic topic modeling using BERT embeddings. We compare topic models built using traditional statistical models (LDA and NMF) and the BERT-based model, modeling topics over the entire surviving corpus of Roman literature. We find that while quantitative metrics prefer statistical models, qualitative evaluation finds better insights from the neural model. Furthermore, the neural topic model is less sensitive to hyperparameter configuration and thus may make dynamic topic modeling more viable for historical researchers.
翻译:动态主题模型已被提出作为历史分析的工具,但传统方法因配置、解释与评估困难而实用性有限。本研究尝试采用一种基于BERT嵌入的动态主题建模新方法。我们比较了使用传统统计模型(LDA与NMF)与基于BERT的模型所构建的主题模型,并对现存罗马文学全集进行了主题建模分析。研究发现,虽然量化指标更倾向于统计模型,但定性评估显示神经模型能提供更具洞察力的分析结果。此外,神经主题模型对超参数配置的敏感性较低,这可能使动态主题建模对历史研究者更具实用价值。