In an effort to develop topic modeling methods that can be quickly applied to large data sets, we revisit the problem of maximum-likelihood estimation in topic models. It is known, at least informally, that maximum-likelihood estimation in topic models is closely related to non-negative matrix factorization (NMF). Yet, to our knowledge, this relationship has not been exploited previously to fit topic models. We show that recent advances in NMF optimization methods can be leveraged to fit topic models very efficiently, often resulting in much better fits and in less time than existing algorithms for topic models. We also formally make the connection between the NMF optimization problem and maximum-likelihood estimation for the topic model, and using this result we show that the expectation maximization (EM) algorithm for the topic model is essentially the same as the classic multiplicative updates for NMF (the only difference being that the operations are performed in a different order). Our methods are implemented in the R package fastTopics.
翻译:为开发能够快速应用于大规模数据集的主题建模方法,我们重新审视了主题模型中的最大似然估计问题。已知(至少非正式地)主题模型的最大似然估计与非负矩阵分解(NMF)密切相关。然而,据我们所知,这种关联此前并未被用于拟合主题模型。我们证明,NMF优化方法的最新进展可被用于高效拟合主题模型,通常能在更短时间内获得比现有主题模型算法好得多的拟合效果。我们形式化地建立了NMF优化问题与主题模型最大似然估计之间的联系,并利用该结果证明主题模型的期望最大化(EM)算法本质上与经典的NMF乘法更新算法相同(唯一区别在于运算执行顺序不同)。我们的方法已在R包fastTopics中实现。