We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations. Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition in a usage cluster is chosen as the sense label. We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable, and how they can allow users -- historical linguists, lexicographers, or social scientists -- to explore and intuitively explain diachronic trajectories of word meaning. Semantic change analysis is only one of many possible applications of the `definitions as representations' paradigm. Beyond being human-readable, contextualised definitions also outperform token or usage sentence embeddings in word-in-context semantic similarity judgements, making them a new promising type of lexical representation for NLP.
翻译:我们提出使用自动生成的上下文词汇用法的自然语言定义,作为可解释的词和词义表示。针对目标词的一组用法示例及其对应的数据驱动用法聚类(即词义),我们利用专门的Flan-T5语言模型为每个用法生成定义,并从用法聚类中选择最具原型性的定义作为词义标签。我们展示了由此产生的词义标签如何使现有的语义变化分析方法更具可解释性,并允许用户——历史语言学家、词典编纂者或社会科学家——探索并直观解释词义的历史演变轨迹。语义变化分析仅是"定义作为表示"这一范式众多可能应用之一。除具备人类可读性外,其上下文化定义在词汇上下文语义相似性判断中,其性能也优于词元或用法句子嵌入,因此成为自然语言处理领域一种前景广阔的新型词汇表示方法。