Much of the success of modern language models depends on finding a suitable prompt to instruct the model. Until now, it has been largely unknown how variations in the linguistic expression of prompts affect these models. This study systematically and empirically evaluates which linguistic features influence models through paraphrase types, i.e., different linguistic changes at particular positions. We measure behavioral changes for five models across 120 tasks and six families of paraphrases (i.e., morphology, syntax, lexicon, lexico-syntax, discourse, and others). We also control for other prompt engineering factors (e.g., prompt length, lexical diversity, and proximity to training data). Our results show a potential for language models to improve tasks when their prompts are adapted in specific paraphrase types (e.g., 6.7% median gain in Mixtral 8x7B; 5.5% in LLaMA 3 8B). In particular, changes in morphology and lexicon, i.e., the vocabulary used, showed promise in improving prompts. These findings contribute to developing more robust language models capable of handling variability in linguistic expression.
翻译:现代语言模型的成功很大程度上取决于寻找合适的提示来指导模型。迄今为止,提示在语言表达上的变化如何影响这些模型在很大程度上仍是未知的。本研究通过改写类型(即在特定位置进行的不同语言变化)系统性地实证评估了哪些语言特征会影响模型。我们测量了五个模型在120项任务和六类改写(即形态、句法、词汇、词汇-句法、语篇及其他)上的行为变化。同时,我们控制了其他提示工程因素(如提示长度、词汇多样性和与训练数据的接近程度)。结果表明,当提示通过特定改写类型进行调整时,语言模型有潜力改进任务性能(例如,Mixtral 8x7B的中位数增益为6.7%;LLaMA 3 8B为5.5%)。特别是形态和词汇(即所用词汇)的变化显示出改进提示的潜力。这些发现有助于开发能够处理语言表达变异性的更鲁棒的语言模型。