Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents. However, traditional HTMs often produce hierarchies where lowerlevel topics are unrelated and not specific enough to their higher-level topics. Additionally, these methods can be computationally expensive. We present HyHTM - a Hyperbolic geometry based Hierarchical Topic Models - that addresses these limitations by incorporating hierarchical information from hyperbolic geometry to explicitly model hierarchies in topic models. Experimental results with four baselines show that HyHTM can better attend to parent-child relationships among topics. HyHTM produces coherent topic hierarchies that specialise in granularity from generic higher-level topics to specific lowerlevel topics. Further, our model is significantly faster and leaves a much smaller memory footprint than our best-performing baseline.We have made the source code for our algorithm publicly accessible.
翻译:层次主题模型(HTMs)有助于发现文档集合中的主题层级结构。然而,传统HTMs常产生低层主题与高层主题缺乏关联且特异性不足的层级结构,且这些方法计算开销较大。本文提出HyHTM——一种基于双曲几何的层次主题模型——通过融入双曲几何中的层次结构信息显式建模主题层级,从而克服上述局限性。基于四个基准模型的实验结果表明,HyHTM能更好地关注主题间的父子关系。该模型能生成连贯的主题层级,粒度从通用的高层主题逐渐细化至特定的低层主题。此外,与性能最优的基准模型相比,我们的模型在显著提升速度的同时大幅降低了内存占用。我们已公开该算法的源代码供访问。