Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity Typing

Ultra-fine entity typing plays a crucial role in information extraction by predicting fine-grained semantic types for entity mentions in text. However, this task poses significant challenges due to the massive number of entity types in the output space. The current state-of-the-art approaches, based on standard multi-label classifiers or cross-encoder models, suffer from poor generalization performance or inefficient inference. In this paper, we present CASENT, a seq2seq model designed for ultra-fine entity typing that predicts ultra-fine types with calibrated confidence scores. Our model takes an entity mention as input and employs constrained beam search to generate multiple types autoregressively. The raw sequence probabilities associated with the predicted types are then transformed into confidence scores using a novel calibration method. We conduct extensive experiments on the UFET dataset which contains over 10k types. Our method outperforms the previous state-of-the-art in terms of F1 score and calibration error, while achieving an inference speedup of over 50 times. Additionally, we demonstrate the generalization capabilities of our model by evaluating it in zero-shot and few-shot settings on five specialized domain entity typing datasets that are unseen during training. Remarkably, our model outperforms large language models with 10 times more parameters in the zero-shot setting, and when fine-tuned on 50 examples, it significantly outperforms ChatGPT on all datasets. Our code, models and demo are available at https://github.com/yanlinf/CASENT.

翻译：超细粒度实体分类通过预测文本中实体提及的细粒度语义类型，在信息抽取中发挥着关键作用。然而，由于输出空间中实体类型数量庞大，该任务面临重大挑战。当前基于标准多标签分类器或交叉编码器模型的最先进方法，存在泛化性能差或推理效率低的问题。本文提出CASENT——一种专为超细粒度实体分类设计的序列到序列模型，能够以校准后的置信度分数预测超细粒度类型。该模型以实体提及为输入，采用约束束搜索自回归生成多个类型，并通过一种新颖的校准方法将预测类型对应的原始序列概率转换为置信度分数。我们在包含超过1万种类型的UFET数据集上进行大量实验，结果表明，我们的方法在F1分数和校准误差上均超越先前最先进方法，同时实现超过50倍的推理加速。此外，我们在训练期间未见过的五个专业领域实体分类数据集上，通过零样本和少样本设置评估模型泛化能力。令人瞩目的是，在零样本设置下，我们的模型性能优于参数规模大10倍的大语言模型；在利用50个样本微调后，其在所有数据集上的表现显著超越ChatGPT。我们的代码、模型和演示程序已开源至https://github.com/yanlinf/CASENT。