Negative sampling has emerged as an effective technique that enables deep learning models to learn better representations by introducing the paradigm of learn-to-compare. The goal of this approach is to add robustness to deep learning models to learn better representation by comparing the positive samples against the negative ones. Despite its numerous demonstrations in various areas of computer vision and natural language processing, a comprehensive study of the effect of negative sampling in an unsupervised domain like topic modeling has not been well explored. In this paper, we present a comprehensive analysis of the impact of different negative sampling strategies on neural topic models. We compare the performance of several popular neural topic models by incorporating a negative sampling technique in the decoder of variational autoencoder-based neural topic models. Experiments on four publicly available datasets demonstrate that integrating negative sampling into topic models results in significant enhancements across multiple aspects, including improved topic coherence, richer topic diversity, and more accurate document classification. Manual evaluations also indicate that the inclusion of negative sampling into neural topic models enhances the quality of the generated topics. These findings highlight the potential of negative sampling as a valuable tool for advancing the effectiveness of neural topic models.
翻译:负采样已成为一种有效的技术,通过引入“学习比较”范式,使深度学习模型能够学习更好的表示。该方法的目标是通过将正样本与负样本进行比较,增强深度学习模型的鲁棒性,从而学习更优的表示。尽管负采样在计算机视觉和自然语言处理的多个领域已得到广泛验证,但其在无监督领域(如主题建模)中的影响尚未得到充分探索。本文对不同负采样策略在神经主题模型上的影响进行了全面分析。我们通过在基于变分自编码器的神经主题模型的解码器中引入负采样技术,比较了几种流行神经主题模型的性能。在四个公开数据集上的实验表明,将负采样集成到主题模型中可在多个方面带来显著提升,包括改进的主题连贯性、更丰富的主题多样性以及更准确的文档分类。人工评估结果也表明,在神经主题模型中引入负采样能够提升生成主题的质量。这些发现凸显了负采样作为提升神经主题模型有效性的重要工具的潜力。