Large language models (LLMs) excel on new tasks without additional training, simply by providing natural language prompts that demonstrate how the task should be performed. Prompt ensemble methods comprehensively harness the knowledge of LLMs while mitigating individual biases and errors and further enhancing performance. However, more prompts do not necessarily lead to better results, and not all prompts are beneficial. A small number of high-quality prompts often outperform many low-quality prompts. Currently, there is a lack of a suitable method for evaluating the impact of prompts on the results. In this paper, we utilize the Shapley value to fairly quantify the contributions of prompts, helping to identify beneficial or detrimental prompts, and potentially guiding prompt valuation in data markets. Through extensive experiments employing various ensemble methods and utility functions on diverse tasks, we validate the effectiveness of using the Shapley value method for prompts as it effectively distinguishes and quantifies the contributions of each prompt.
翻译:大语言模型(LLMs)能够通过提供展示任务执行方式的自然语言提示,无需额外训练即可在新任务中表现卓越。提示集成方法在利用LLMs综合知识的同时,有效减轻了个体偏差与错误,进一步提升了模型性能。然而,并非提示数量越多效果越好,也并非所有提示都具有价值——少量高质量提示往往优于大量低质量提示。当前尚缺乏评估提示对结果影响的合适方法。本文利用Shapley值公平量化提示的贡献,有助于识别有益或有害的提示,并可能指导数据市场中的提示估值。通过在各类任务上采用多种集成方法与效用函数进行广泛实验,我们验证了Shapley值方法在提示评估中的有效性,该方法能够有效区分并量化每个提示的贡献。