Recent studies have demonstrated the usefulness of contextualized word embeddings in unsupervised semantic frame induction. However, they have also revealed that generic contextualized embeddings are not always consistent with human intuitions about semantic frames, which causes unsatisfactory performance for frame induction based on contextualized embeddings. In this paper, we address supervised semantic frame induction, which assumes the existence of frame-annotated data for a subset of predicates in a corpus and aims to build a frame induction model that leverages the annotated data. We propose a model that uses deep metric learning to fine-tune a contextualized embedding model, and we apply the fine-tuned contextualized embeddings to perform semantic frame induction. Our experiments on FrameNet show that fine-tuning with deep metric learning considerably improves the clustering evaluation scores, namely, the B-cubed F-score and Purity F-score, by about 8 points or more. We also demonstrate that our approach is effective even when the number of training instances is small.
翻译:近期的研究表明,上下文词嵌入在无监督语义框架归纳中具有实用价值。然而,这些研究也揭示出通用上下文嵌入并不总是与人类对语义框架的直觉保持一致,这导致基于上下文嵌入的框架归纳性能不尽人意。本文研究有监督语义框架归纳问题,该任务假设语料库中部分谓词存在框架标注数据,旨在构建能够利用这些标注数据的框架归纳模型。我们提出了一种采用深度度量学习微调上下文嵌入模型的方案,并应用微调后的上下文嵌入进行语义框架归纳。在FrameNet上的实验表明,深度度量学习微调显著提升了聚类评估指标——B立方F分数和纯度F分数提升约8个百分点以上。我们还证明即使在训练实例数量较少的情况下,该方法依然有效。