Entity Set Expansion is an important NLP task that aims at expanding a small set of entities into a larger one with items from a large pool of candidates. In this paper, we propose GausSetExpander, an unsupervised approach based on optimal transport techniques. We propose to re-frame the problem as choosing the entity that best completes the seed set. For this, we interpret a set as an elliptical distribution with a centroid which represents the mean and a spread that is represented by the scale parameter. The best entity is the one that increases the spread of the set the least. We demonstrate the validity of our approach by comparing to state-of-the art approaches.
翻译:实体集扩展是一项重要的自然语言处理任务,目标是从大量候选集合中将小型实体集扩展为更大型的实体集。本文提出GausSetExpander——一种基于最优传输技术的无监督方法。我们提出将问题重新定义为选择最能完善种子集的实体。为此,我们将集合解释为服从椭圆分布的集合:其质心代表均值,而尺度参数代表离散程度。最佳实体即是对集合离散程度增加最小的实体。通过与当前最先进的方法进行对比,我们验证了该方法的有效性。