Topic models are valuable for understanding extensive document collections, but they don't always identify the most relevant topics. Classical probabilistic and anchor-based topic models offer interactive versions that allow users to guide the models towards more pertinent topics. However, such interactive features have been lacking in neural topic models. To correct this lacuna, we introduce a user-friendly interaction for neural topic models. This interaction permits users to assign a word label to a topic, leading to an update in the topic model where the words in the topic become closely aligned with the given label. Our approach encompasses two distinct kinds of neural topic models. The first includes models where topic embeddings are trainable and evolve during the training process. The second kind involves models where topic embeddings are integrated post-training, offering a different approach to topic refinement. To facilitate user interaction with these neural topic models, we have developed an interactive interface. This interface enables users to engage with and re-label topics as desired. We evaluate our method through a human study, where users can relabel topics to find relevant documents. Using our method, user labeling improves document rank scores, helping to find more relevant documents to a given query when compared to no user labeling.
翻译:主题模型对于理解大规模文档集合具有重要价值,但并非总能识别出最相关的主题。经典的概率主题模型和基于锚点的主题模型提供了交互式版本,允许用户引导模型生成更贴切的主题。然而,这类交互功能在神经主题模型中尚属空白。为弥补这一缺陷,我们提出了一种面向神经主题模型的用户友好型交互方法。该交互允许用户为主题分配一个单词标签,从而更新主题模型,使主题中的词语与给定标签紧密对齐。我们的方法涵盖两类神经主题模型:第一类模型中,主题嵌入是可训练的,并会在训练过程中动态演化;第二类模型则是在训练完成后整合主题嵌入,提供了一种不同的主题优化途径。为支持用户与这些神经主题模型的交互,我们开发了一个交互式界面。该界面使用户能够按需操作并重新标注主题。我们通过用户研究评估了该方法:在研究中,用户可重新标注主题以查找相关文档。实验结果表明,与无用户标注的情况相比,我们的方法通过用户标注提升了文档排名得分,有助于针对给定查询找到更多相关文档。