Topic models have been proposed for decades with various applications and recently refreshed by the neural variational inference. However, these topic models adopt totally distinct dataset, implementation, and evaluation settings, which hinders their quick utilization and fair comparisons. This greatly hinders the research progress of topic models. To address these issues, in this paper we propose a Topic Modeling System Toolkit (TopMost). Compared to existing toolkits, TopMost stands out by covering a wider range of topic modeling scenarios including complete lifecycles with dataset pre-processing, model training, testing, and evaluations. The highly cohesive and decoupled modular design of TopMost enables quick utilization, fair comparisons, and flexible extensions of different topic models. This can facilitate the research and applications of topic models. Our code, tutorials, and documentation are available at https://github.com/bobxwu/topmost.
翻译:主题模型已被提出数十年,广泛应用于各类场景,并近期通过神经变分推断得到更新。然而,这些主题模型采用完全不同的数据集、实现方式和评估设置,阻碍了其快速应用和公平比较,极大地限制了主题模型的研究进展。为解决上述问题,本文提出主题建模系统工具包(TopMost)。与现有工具包相比,TopMost通过覆盖包括数据集预处理、模型训练、测试与评估在内的完整生命周期,在更广泛的主题建模场景中脱颖而出。其高度内聚与解耦的模块化设计,使得不同主题模型能够实现快速应用、公平比较与灵活扩展,从而推动主题模型的研究与应用。我们的代码、教程及文档详见https://github.com/bobxwu/topmost。