Despite the revolution caused by deep NLP models, they remain black boxes, necessitating research to understand their decision-making processes. A recent work by Dalvi et al. (2022) carried out representation analysis through the lens of clustering latent spaces within pre-trained models (PLMs), but that approach is limited to small scale due to the high cost of running Agglomerative hierarchical clustering. This paper studies clustering algorithms in order to scale the discovery of encoded concepts in PLM representations to larger datasets and models. We propose metrics for assessing the quality of discovered latent concepts and use them to compare the studied clustering algorithms. We found that K-Means-based concept discovery significantly enhances efficiency while maintaining the quality of the obtained concepts. Furthermore, we demonstrate the practicality of this newfound efficiency by scaling latent concept discovery to LLMs and phrasal concepts.
翻译:尽管深度NLP模型引发了革命性变革,但它们仍是黑箱,亟需研究以理解其决策过程。Dalvi等人(2022)近期通过聚类预训练模型(PLMs)中的潜在空间开展了表征分析,但由于运行凝聚层次聚类的高昂成本,该方法仅限于小规模应用。本文研究了聚类算法,旨在将PLM表征中编码概念的发现扩展到更大规模的数据集和模型。我们提出了评估所发现潜在概念质量的指标,并利用这些指标对研究的聚类算法进行比较。研究发现,基于K-Means的概念发现方法在保持所得概念质量的同时显著提升了效率。此外,我们通过将潜在概念发现扩展到大型语言模型(LLMs)和短语概念,展示了这一新效率的实用性。