Ultra-fine entity typing (UFET) is the task of inferring the semantic types, from a large set of fine-grained candidates, that apply to a given entity mention. This task is especially challenging because we only have a small number of training examples for many of the types, even with distant supervision strategies. State-of-the-art models, therefore, have to rely on prior knowledge about the type labels in some way. In this paper, we show that the performance of existing methods can be improved using a simple technique: we use pre-trained label embeddings to cluster the labels into semantic domains and then treat these domains as additional types. We show that this strategy consistently leads to improved results, as long as high-quality label embeddings are used. We furthermore use the label clusters as part of a simple post-processing technique, which results in further performance gains. Both strategies treat the UFET model as a black box and can thus straightforwardly be used to improve a wide range of existing models.
翻译:超细粒度实体分类(UFET)是一项从大量细粒度候选类型中推断给定实体指称对应语义类型的任务。由于许多类型仅有少量训练样本,即使采用远程监督策略,该任务仍极具挑战性。因此,现有最优模型必须借助某种形式的标签先验知识。本文证明,通过一种简单技术即可提升现有方法的性能:我们利用预训练标签嵌入将标签聚类至语义域,并将这些域作为额外类型处理。实验表明,只要使用高质量的标签嵌入,该策略便能持续带来性能提升。此外,我们将标签聚类作为简单后处理技术的一部分,进一步提升了模型效果。这两种策略均将UFET模型视为黑箱,因此可直接用于改进现有多种模型。