Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable and their immense sizes make the models difficult to be tuned with common hardware. As a result, effectively tuning these models with large-scale supervised data can be challenging. As an alternative, In-Context Learning (ICL) can only use a small number of supervised examples due to context length limits. In this paper, we propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models, resulting in superior performance on supervised tasks. Our experiments demonstrate that SuperICL can improve performance beyond state-of-the-art fine-tuned models while addressing the instability problem of in-context learning. Furthermore, SuperICL can enhance the capabilities of smaller models, such as multilinguality and interpretability.
翻译:大型语言模型(如GPT-3和GPT-4)功能强大,但其权重通常不对外公开,且模型规模庞大,难以在普通硬件上进行微调。因此,利用大规模监督数据有效微调这些模型颇具挑战性。作为替代方案,上下文学习(ICL)由于上下文长度限制,仅能使用少量监督样本。本文提出超级上下文学习(SuperICL)方法,允许黑盒大型语言模型与本地微调的小型模型协同工作,从而在监督任务上取得更优性能。实验表明,SuperICL不仅能超越当前最先进微调模型的表现,还能解决上下文学习中的稳定性问题。此外,SuperICL可增强小型模型的多语言能力和可解释性等特性。