In-context learning can help Large Language Models (LLMs) to adapt new tasks without additional training. However, this performance heavily depends on the quality of the demonstrations, driving research into effective demonstration selection algorithms to optimize this process. These algorithms assist users in selecting the best $k$ input-label pairs (demonstration examples) based on a given test input, enabling LLMs to in-context learn the relationship between the provided examples and the test inputs. Despite all the proposed demonstration selection algorithms, their efficiency and effectiveness remain unclear. This lack of clarity make it difficult to apply these algorithms in real-world scenarios and poses challenges for future research aimed at developing improved methods. This paper revisits six proposed algorithms, evaluating them on five datasets from both efficiency and effectiveness perspectives. Our experiments reveal significant variations in algorithm performance across different tasks, with some methods struggling to outperform random selection in certain scenarios. We also find that increasing the number of demonstrations does not always lead to better performance, and that there are often trade-offs between accuracy and computational efficiency. Our code is available at https://github.com/Tizzzzy/Demonstration_Selection_Overview.
翻译:上下文学习能够帮助大语言模型(LLMs)在不进行额外训练的情况下适应新任务。然而,其性能在很大程度上依赖于演示示例的质量,这推动了针对有效演示选择算法的研究,以优化这一过程。这些算法基于给定的测试输入,帮助用户选择最佳的 $k$ 个输入-标签对(演示示例),从而使LLMs能够通过上下文学习理解所提供的示例与测试输入之间的关系。尽管已有多种演示选择算法被提出,但其效率和有效性仍不明确。这种不明确性使得这些算法难以在实际场景中应用,并对旨在开发改进方法的未来研究构成了挑战。本文重新审视了六种已提出的算法,从效率和有效性两个角度在五个数据集上对它们进行了评估。我们的实验表明,不同算法在不同任务上的性能存在显著差异,某些方法在某些场景下甚至难以超越随机选择。我们还发现,增加演示示例的数量并不总能带来更好的性能,并且在准确性和计算效率之间往往存在权衡。我们的代码可在 https://github.com/Tizzzzy/Demonstration_Selection_Overview 获取。