Textual entailment recognition is one of the basic natural language understanding(NLU) tasks. Understanding the meaning of sentences is a prerequisite before applying any natural language processing(NLP) techniques to automatically recognize the textual entailment. A text entails a hypothesis if and only if the true value of the hypothesis follows the text. Classical approaches generally utilize the feature value of each word from word embedding to represent the sentences. In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis, thereby introducing a new semantic feature focusing on empirical threshold-based semantic text representation. We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair. We carried out several experiments on a benchmark entailment classification(SICK-RTE) dataset. We train several machine learning(ML) algorithms applying both semantic and lexical features to classify the text-hypothesis pair as entailment, neutral, or contradiction. Our empirical sentence representation technique enriches the semantic information of the texts and hypotheses found to be more efficient than the classical ones. In the end, our approach significantly outperforms known methods in understanding the meaning of the sentences for the textual entailment classification task.
翻译:文本蕴含识别是基础的自然语言理解(NLU)任务之一。在应用任何自然语言处理(NLP)技术自动识别文本蕴含之前,理解句子含义是必要前提。一个文本蕴含一个假设当且仅当假设的真值遵循该文本。经典方法通常利用词嵌入中每个词的特征值来表示句子。本文提出一种新颖方法以识别文本与假设之间的文本蕴含关系,从而引入一种聚焦于基于经验阈值的语义文本表示的新型语义特征。我们采用基于逐元素曼哈顿距离向量的特征,该特征可识别文本-假设对之间的语义蕴含关系。我们在基准蕴含分类(SICK-RTE)数据集上进行了多项实验。我们训练了多种机器学习(ML)算法,同时应用语义特征和词汇特征将文本-假设对分类为蕴含、中立或矛盾。我们的经验句子表示技术丰富了文本和假设的语义信息,发现其比经典方法更为高效。最终,我们的方法在理解句子含义用于文本蕴含分类任务方面显著优于已知方法。