Machine learning models can reach high performance on benchmark natural language processing (NLP) datasets but fail in more challenging settings. We study this issue when a pre-trained model learns dataset artifacts in natural language inference (NLI), the topic of studying the logical relationship between a pair of text sequences. We provide a variety of techniques for analyzing and locating dataset artifacts inside the crowdsourced Stanford Natural Language Inference (SNLI) corpus. We study the stylistic pattern of dataset artifacts in the SNLI. To mitigate dataset artifacts, we employ a unique multi-scale data augmentation technique with two distinct frameworks: a behavioral testing checklist at the sentence level and lexical synonym criteria at the word level. Specifically, our combination method enhances our model's resistance to perturbation testing, enabling it to continuously outperform the pre-trained baseline.
翻译:机器学习模型可以在基准自然语言处理数据集上达到高性能,但在更具挑战性的场景中却会失效。我们研究了当预训练模型学习自然语言推理(NLI)中的数据集人工产物时出现的问题,NLI是研究一对文本序列之间逻辑关系的课题。我们提供了一系列用于分析和定位众包斯坦福自然语言推理(SNLI)语料库中数据集人工产物的技术。我们研究了SNLI中数据集人工产物的风格模式。为了缓解数据集人工产物,我们采用了一种独特的多尺度数据增强技术,该技术包含两个不同的框架:句子层面的行为测试清单和词汇层面的同义词替代标准。具体来说,我们的组合方法增强了模型对扰动测试的抗性,使其能够持续超越预训练基线模型。