In-context learning (ICL) for large language models has proven to be a powerful approach for many natural language processing tasks. However, determining the best method to select examples for ICL is nontrivial as the results can vary greatly depending on the quality, quantity, and order of examples used. In this paper, we conduct a case study on text simplification (TS) to investigate how to select the best and most robust examples for ICL. We propose Metric-Based in-context Learning (MBL) method that utilizes commonly used TS metrics such as SARI, compression ratio, and BERT-Precision for selection. Through an extensive set of experiments with various-sized GPT models on standard TS benchmarks such as TurkCorpus and ASSET, we show that examples selected by the top SARI scores perform the best on larger models such as GPT-175B, while the compression ratio generally performs better on smaller models such as GPT-13B and GPT-6.7B. Furthermore, we demonstrate that MBL is generally robust to example orderings and out-of-domain test sets, and outperforms strong baselines and state-of-the-art finetuned language models. Finally, we show that the behaviour of large GPT models can be implicitly controlled by the chosen metric. Our research provides a new framework for selecting examples in ICL, and demonstrates its effectiveness in text simplification tasks, breaking new ground for more accurate and efficient NLG systems.
翻译:上下文学习(ICL)在大语言模型中被证明是许多自然语言处理任务的有效方法。然而,为ICL选择最佳示例的方法并非易事,因为其结果可能因所用示例的质量、数量和顺序而产生巨大差异。本文以文本简化(TS)为案例,研究如何为ICL选择最佳且最鲁棒的示例。我们提出基于度量的上下文学习(MBL)方法,该方法利用SARI、压缩比和BERT-Precision等常用TS度量进行选择。通过在标准TS基准(如TurkCorpus和ASSET)上对不同大小的GPT模型进行大量实验,我们证明:对于GPT-175B等较大模型,由最高SARI分值选出的示例表现最佳;而对于GPT-13B和GPT-6.7B等较小模型,压缩比通常表现更优。此外,我们证明MBL对示例排序和域外测试集普遍具有鲁棒性,并能超越强基线方法及最先进的微调语言模型。最后,我们展示了大型GPT模型的行为可通过所选度量隐式控制。本研究为ICL中示例选择提供了新框架,并在文本简化任务中验证了其有效性,为更准确高效的NLG系统开辟了新方向。