Despite their impressive performance on a wide variety of tasks, modern language models remain susceptible to distribution shifts, exhibiting brittle behavior when evaluated on data that differs in distribution from their training data. In this paper, we describe how distribution shifts in language models can be separated into observable and unobservable components, and we discuss how established approaches for dealing with distribution shift address only the former. Importantly, we identify that the resulting omitted variable bias from unobserved variables can compromise both evaluation and optimization in language models. To address this challenge, we introduce a framework that maps the strength of the omitted variables to bounds on the worst-case generalization performance of language models under distribution shift. In empirical experiments, we show that using these bounds directly in language model evaluation and optimization provides more principled measures of out-of-distribution performance, improves true out-of-distribution performance relative to standard distribution shift adjustment methods, and further enables inference about the strength of the omitted variables when target distribution labels are available.
翻译:尽管现代语言模型在各种任务上展现出令人瞩目的性能,但其对分布偏移仍表现出脆弱性:当评估数据与训练数据存在分布差异时,模型行为会变得不稳定。本文阐述了语言模型中的分布偏移如何分解为可观测与不可观测两部分,并论证现有处理分布偏移的方法仅能应对前者。关键的是,我们发现由不可观测变量导致的遗漏变量偏差会同时损害语言模型的评估与优化过程。为应对这一挑战,我们提出了一个理论框架,该框架将不可观测变量的影响强度映射到分布偏移下语言模型最坏情况泛化性能的边界上。在实证实验中,我们证明直接将这些边界应用于语言模型的评估与优化过程,能够为分布外性能提供更严谨的度量标准,相较于标准分布偏移调整方法提升了真实的分布外性能,并在可获得目标分布标签时进一步支持对不可观测变量强度的统计推断。