Semantic similarity measures are widely used in natural language processing to catalyze various computer-related tasks. However, no single semantic similarity measure is the most appropriate for all tasks, and researchers often use ensemble strategies to ensure performance. This research work proposes a method for automatically designing semantic similarity ensembles. In fact, our proposed method uses grammatical evolution, for the first time, to automatically select and aggregate measures from a pool of candidates to create an ensemble that maximizes correlation to human judgment. The method is evaluated on several benchmark datasets and compared to state-of-the-art ensembles, showing that it can significantly improve similarity assessment accuracy and outperform existing methods in some cases. As a result, our research demonstrates the potential of using grammatical evolution to automatically compare text and prove the benefits of using ensembles for semantic similarity tasks. The source code that illustrates our approach can be downloaded from https://github.com/jorge-martinez-gil/sesige.
翻译:语义相似性度量在自然语言处理领域被广泛用于促进各类计算机相关任务。然而,不存在适用于所有任务的最优单一语义相似性度量,研究者常采用集成策略以确保性能。本研究提出一种自动设计语义相似性集成的方法。该方法首次运用语法进化技术,从候选度量池中自动选择并聚合多个度量,构建出与人类判断相关性最大化的集成系统。通过在多个基准数据集上的评估,并与当前最先进的集成方法进行对比,结果表明该方法能显著提升相似性评估精度,且在部分场景中优于现有方法。因此,本研究证实了利用语法进化技术进行文本自动比较的潜力,并论证了集成方法在语义相似性任务中的优势。相关实现代码可通过 https://github.com/jorge-martinez-gil/sesige 获取。