Prompt-based pre-trained language models (PLMs) paradigm have succeeded substantially in few-shot natural language processing (NLP) tasks. However, prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts, which is costly, inefficient, and subjective. Meanwhile, existing continuous prompt optimization methods improve the performance by learning the ideal prompts through the gradient information of PLMs, whose high computational cost, and low readability and generalizability are often concerning. To address the research gap, we propose a Dialogue-comprised Policy-gradient-based Discrete Prompt Optimization ($DP_2O$) method. We first design a multi-round dialogue alignment strategy for readability prompt set generation based on GPT-4. Furthermore, we propose an efficient prompt screening metric to identify high-quality prompts with linear complexity. Finally, we construct a reinforcement learning (RL) framework based on policy gradients to match the prompts to inputs optimally. By training a policy network with only 0.67% of the PLM parameter size on the tasks in the few-shot setting, $DP_2O$ outperforms the state-of-the-art (SOTA) method by 1.52% in accuracy on average on four open-source datasets. Moreover, subsequent experiments also demonstrate that $DP_2O$ has good universality, robustness, and generalization ability.
翻译:基于提示的预训练语言模型(PLMs)范式在少样本自然语言处理(NLP)任务中取得了显著成功。然而,现有的离散提示优化方法需要专家知识来设计基础提示集并识别高质量提示,这导致成本高昂、效率低下且具有主观性。同时,现有的连续提示优化方法通过利用PLMs的梯度信息学习理想提示来提升性能,但其计算成本高、可读性和泛化能力差的问题常引起关注。为解决这一研究空白,我们提出了一种基于对话式策略梯度的离散提示优化方法($DP_2O$)。首先,我们设计了一种基于GPT-4的多轮对话对齐策略,用于生成可读性强的提示集。其次,我们提出了一种高效的提示筛选指标,能够以线性复杂度识别高质量提示。最后,我们构建了一个基于策略梯度的强化学习(RL)框架,以最优方式将提示与输入匹配。通过在少样本场景下的任务中训练仅占PLM参数大小0.67%的策略网络,$DP_2O$在四个开源数据集上的平均准确率比当前最先进(SOTA)方法高出1.52%。此外,后续实验也证明$DP_2O$具有良好的通用性、鲁棒性和泛化能力。