With growing capabilities of large language models, prompting them has become the dominant way to access them. This has motivated the development of strategies for automatically selecting effective language prompts. In this paper, we introduce prompt flatness, a new metric to quantify the expected utility of a language prompt. This metric is inspired by flatness regularization in statistical learning that quantifies the robustness of the model towards its parameter perturbations. We provide theoretical foundations for this metric and its relationship with other prompt selection metrics, providing a comprehensive understanding of existing methods. Empirically, we show that combining prompt flatness with existing metrics improves both performance and sample efficiency. Our metric outperforms the previous prompt selection metrics with an average increase of 5% in accuracy and 10% in Pearson correlation across 6 classification benchmarks.
翻译:摘要:随着大型语言模型能力的不断增强,提示已成为访问它们的主要方式。这推动了自动选择有效语言提示策略的发展。本文提出"提示平坦性"这一新指标,用于量化语言提示的预期效用。该指标受统计学习中平坦性正则化的启发,后者可量化模型对其参数扰动的鲁棒性。我们为该指标及其与其他提示选择指标的关系提供了理论基础,从而全面理解现有方法。实验表明,将提示平坦性与现有指标相结合,能同时提升性能与样本效率。在6个分类基准测试中,我们的指标相比现有提示选择方法,平均准确率提升5%,皮尔逊相关系数提升10%。