Recent studies demonstrated that large language models (LLMs) can excel in many tasks via in-context learning (ICL). However, recent works show that ICL-prompted models tend to produce inaccurate results when presented with adversarial inputs. In this work, we investigate whether augmenting ICL with natural language explanations (NLEs) improves the robustness of LLMs on adversarial datasets covering natural language inference and paraphrasing identification. We prompt LLMs with a small set of human-generated NLEs to produce further NLEs, yielding more accurate results than both a zero-shot-ICL setting and using only human-generated NLEs. Our results on five popular LLMs (GPT3.5-turbo, Llama2, Vicuna, Zephyr, and Mistral) show that our approach yields over 6% improvement over baseline approaches for eight adversarial datasets: HANS, ISCS, NaN, ST, PICD, PISP, ANLI, and PAWS. Furthermore, previous studies have demonstrated that prompt selection strategies significantly enhance ICL on in-distribution test sets. However, our findings reveal that these strategies do not match the efficacy of our approach for robustness evaluations, resulting in an accuracy drop of 8% compared to the proposed approach.
翻译:近期研究表明,大语言模型(LLMs)能通过上下文学习(ICL)在多项任务中表现优异。然而,最新研究发现,面对对抗性输入时,采用ICL提示的模型往往产生不准确的结果。本研究探究了在ICL中引入自然语言解释(NLEs)能否提升LLMs在涵盖自然语言推理和复述识别的对抗数据集上的鲁棒性。我们使用少量人工生成的NLEs对LLMs进行提示以生成更多NLEs,其准确率既优于零样本ICL设置,也高于仅使用人工生成NLEs的方案。在五个主流LLMs(GPT3.5-turbo、Llama2、Vicuna、Zephyr和Mistral)上的实验结果显示,针对HANS、ISCS、NaN、ST、PICD、PISP、ANLI和PAWS等八个对抗数据集,本方法较基线方案提升了超过6%的性能。此外,先前研究已证明提示选择策略能显著增强ICL在分布内测试集上的表现。然而,我们的发现表明,在鲁棒性评估中,这些策略的效果无法与本文方法相媲美,相较于本方法准确率下降了8%。