Large language models adapt to new tasks through in-context learning (ICL) without parameter updates. Current theoretical explanations for this capability assume test tasks are drawn from a distribution similar to that seen during pretraining. This assumption overlooks adversarial distribution shifts that threaten real-world reliability. To address this gap, we introduce a distributionally robust meta-learning framework that provides worst-case performance guarantees for ICL under Wasserstein-based distribution shifts. Focusing on linear self-attention Transformers, we derive a non-asymptotic bound linking adversarial perturbation strength ($ρ$), model capacity ($m$), and the number of in-context examples ($N$). The analysis reveals that model robustness scales with the square root of its capacity ($ρ_{\text{max}} \propto \sqrt{m}$), while adversarial settings impose a sample complexity penalty proportional to the square of the perturbation magnitude ($N_ρ- N_0 \propto ρ^2$). Experiments on synthetic tasks confirm these scaling laws. These findings advance the theoretical understanding of ICL's limits under adversarial conditions and suggest that model capacity serves as a fundamental resource for distributional robustness.
翻译:大型语言模型通过情境学习(ICL)适应新任务,而无需更新参数。目前对此能力的理论解释假设测试任务来自与预训练期间所见相似的分布。这一假设忽略了威胁现实世界可靠性的对抗性分布偏移。为弥补这一空白,我们引入了一个分布鲁棒的元学习框架,该框架为基于Wasserstein距离的分布偏移下的ICL提供了最坏情况性能保证。聚焦于线性自注意力Transformer,我们推导了一个非渐近边界,将对抗性扰动强度($ρ$)、模型容量($m$)和情境示例数量($N$)联系起来。分析表明,模型鲁棒性随其容量的平方根缩放($ρ_{\text{max}} \propto \sqrt{m}$),而对抗性设置会施加一个与扰动幅度平方成正比的样本复杂度惩罚($N_ρ- N_0 \propto ρ^2$)。在合成任务上的实验证实了这些缩放规律。这些发现推进了对对抗条件下ICL极限的理论理解,并表明模型容量是分布鲁棒性的一个基本资源。