Diversity in demonstration selection is crucial for enhancing model generalization, as it enables a broader coverage of structures and concepts. However, constructing an appropriate set of demonstrations has remained a focal point of research. This paper presents the Relevance-Diversity Enhanced Selection (RDES), an innovative approach that leverages reinforcement learning to optimize the selection of diverse reference demonstrations for text classification tasks using Large Language Models (LLMs), especially in few-shot prompting scenarios. RDES employs a Q-learning framework to dynamically identify demonstrations that maximize both diversity and relevance to the classification objective by calculating a diversity score based on label distribution among selected demonstrations. This method ensures a balanced representation of reference data, leading to improved classification accuracy. Through extensive experiments on four benchmark datasets and involving 12 closed-source and open-source LLMs, we demonstrate that RDES significantly enhances classification accuracy compared to ten established baselines. Furthermore, we investigate the incorporation of Chain-of-Thought (CoT) reasoning in the reasoning process, which further enhances the model's predictive performance. The results underscore the potential of reinforcement learning to facilitate adaptive demonstration selection and deepen the understanding of classification challenges.
翻译:演示选择的多样性对于提升模型泛化能力至关重要,因为它能够更广泛地覆盖结构和概念。然而,如何构建合适的演示集一直是研究的重点。本文提出了相关性-多样性增强选择方法,这是一种创新性方法,它利用强化学习来优化为大型语言模型文本分类任务(尤其是在少样本提示场景中)选择多样参考演示的过程。该方法采用Q学习框架,通过基于所选演示中标签分布计算多样性分数,动态识别那些能最大化多样性和与分类目标相关性的演示。这种方法确保了参考数据的平衡表示,从而提高了分类准确性。通过在四个基准数据集上进行大量实验,并涉及12个闭源和开源大型语言模型,我们证明了与十个现有基线方法相比,该方法显著提升了分类准确性。此外,我们研究了在推理过程中融入思维链推理,这进一步增强了模型的预测性能。结果凸显了强化学习在促进自适应演示选择以及深化对分类挑战理解方面的潜力。