Semantic similarity measures are widely used in natural language processing to catalyze various computer-related tasks. However, no single semantic similarity measure is the most appropriate for all tasks, and researchers often use ensemble strategies to ensure performance. This research work proposes a method for automatically designing semantic similarity ensembles. In fact, our proposed method uses grammatical evolution, for the first time, to automatically select and aggregate measures from a pool of candidates to create an ensemble that maximizes correlation to human judgment. The method is evaluated on several benchmark datasets and compared to state-of-the-art ensembles, showing that it can significantly improve similarity assessment accuracy and outperform existing methods in some cases. As a result, our research demonstrates the potential of using grammatical evolution to automatically compare text and prove the benefits of using ensembles for semantic similarity tasks. The source code that illustrates our approach can be downloaded from https://github.com/jorge-martinez-gil/sesige.
翻译:语义相似度度量在自然语言处理中被广泛应用,以催化各种计算机相关任务。然而,没有单一的语义相似度度量适用于所有任务,研究人员通常采用集成策略来确保性能。本研究提出了一种自动设计语义相似度集成的方法。实际上,我们提出的方法首次使用语法进化,从候选池中自动选择和聚合度量,以创建一个最大化与人类判断相关性的集成。该方法在多个基准数据集上进行了评估,并与现有最先进的集成方法进行了比较,结果表明,它可以显著提高相似度评估的准确性,并在某些情况下优于现有方法。因此,我们的研究展示了使用语法进化自动比较文本的潜力,并证明了使用集成方法进行语义相似度任务的优势。展示我们方法的源代码可从https://github.com/jorge-martinez-gil/sesige下载。