We introduce a synthetic dataset called Sentences Involving Complex Compositional Knowledge (SICCK) and a novel analysis that investigates the performance of Natural Language Inference (NLI) models to understand compositionality in logic. We produce 1,304 sentence pairs by modifying 15 examples from the SICK dataset (Marelli et al., 2014). To this end, we modify the original texts using a set of phrases - modifiers that correspond to universal quantifiers, existential quantifiers, negation, and other concept modifiers in Natural Logic (NL) (MacCartney, 2009). We use these phrases to modify the subject, verb, and object parts of the premise and hypothesis. Lastly, we annotate these modified texts with the corresponding entailment labels following NL rules. We conduct a preliminary verification of how well the change in the structural and semantic composition is captured by neural NLI models, in both zero-shot and fine-tuned scenarios. We found that the performance of NLI models under the zero-shot setting is poor, especially for modified sentences with negation and existential quantifiers. After fine-tuning this dataset, we observe that models continue to perform poorly over negation, existential and universal modifiers.
翻译:我们引入了一个名为“涉及复杂组合知识的句子”(Sentences Involving Complex Compositional Knowledge, SICCK)的合成数据集,并提出了一种新颖的分析方法,用于研究自然语言推理(NLI)模型在逻辑组合性方面的表现。我们通过修改SICK数据集(Marelli 等,2014)中的15个示例,生成了1,304个句子对。为此,我们使用一组对应于自然逻辑(NL)(MacCartney, 2009)中的全称量词、存在量词、否定词及其他概念修饰语的短语来修改原始文本。我们利用这些短语对前提和假设中的主语、动词和宾语部分进行修改。最后,我们根据自然逻辑规则,为这些修改后的文本标注相应的蕴含标签。我们对神经NLI模型在零样本和微调场景下捕捉结构和语义组合变化的能力进行了初步验证。我们发现,在零样本设置下,NLI模型的性能较差,尤其是对于包含否定词和存在量词的修改句子。在对该数据集进行微调后,我们观察到模型在否定词、存在量词和全称量词上的表现仍然不佳。