Semantic similarity measures are widely used in natural language processing to catalyze various computer-related tasks. However, no single semantic similarity measure is the most appropriate for all tasks, and researchers often use ensemble strategies to ensure performance. This research work proposes a method for automatically designing semantic similarity ensembles. In fact, our proposed method uses grammatical evolution, for the first time, to automatically select and aggregate measures from a pool of candidates to create an ensemble that maximizes correlation to human judgment. The method is evaluated on several benchmark datasets and compared to state-of-the-art ensembles, showing that it can significantly improve similarity assessment accuracy and outperform existing methods in some cases. As a result, our research demonstrates the potential of using grammatical evolution to automatically compare text and prove the benefits of using ensembles for semantic similarity tasks.
翻译:语义相似度度量广泛应用于自然语言处理领域,以催化各类计算机相关任务。然而,没有任何单一语义相似度度量适用于所有任务,研究人员常采用集成策略来确保性能。本研究提出了一种自动设计语义相似度集成的方法。实际上,我们提出的方法首次采用语法演化,从候选池中自动选择并聚合度量,构建能最大化与人类判断相关性的集成模型。该方法在多个基准数据集上进行了评估,并与当前最优的集成方法进行了比较,结果表明其能显著提升相似度评估准确率,在某些情况下优于现有方法。因此,本研究证实了利用语法演化自动比较文本的潜力,并证明了采用集成方法处理语义相似度任务的优势。