Work done to uncover the knowledge encoded within pre-trained language models rely on annotated corpora or human-in-the-loop methods. However, these approaches are limited in terms of scalability and the scope of interpretation. We propose using a large language model, ChatGPT, as an annotator to enable fine-grained interpretation analysis of pre-trained language models. We discover latent concepts within pre-trained language models by applying agglomerative hierarchical clustering over contextualized representations and then annotate these concepts using ChatGPT. Our findings demonstrate that ChatGPT produces accurate and semantically richer annotations compared to human-annotated concepts. Additionally, we showcase how GPT-based annotations empower interpretation analysis methodologies of which we demonstrate two: probing frameworks and neuron interpretation. To facilitate further exploration and experimentation in the field, we make available a substantial ConceptNet dataset (TCN) comprising 39,000 annotated concepts.
翻译:以往用于揭示预训练语言模型编码知识的工作依赖于标注语料库或人机交互方法。然而,这些方法在可扩展性和解释范围方面存在局限性。我们提出使用大型语言模型ChatGPT作为标注工具,以实现对预训练语言模型的细粒度解释分析。通过对上下文表示应用凝聚层次聚类,我们发现了预训练语言模型中的潜在概念,并利用ChatGPT对这些概念进行标注。结果表明,与人工标注的概念相比,ChatGPT能够产生更准确且语义更丰富的标注。此外,我们还展示了基于GPT的标注如何赋能解释分析方法,本文演示了其中两种:探测框架和神经元解释。为促进该领域的进一步探索与实验,我们公开了一个包含39,000个标注概念的大规模ConceptNet数据集(TCN)。