Recent advancements in LLMs have showcased their remarkable role-playing capabilities, able to accurately simulate the dialogue styles and cognitive processes of various roles based on different instructions and contexts. Studies indicate that assigning LLMs the roles of experts, a strategy known as role-play prompting, can enhance their performance in the corresponding domains. However, the prompt needs to be manually designed for the given problem, requiring certain expertise and iterative modifications. To this end, we propose self-prompt tuning, making LLMs themselves generate role-play prompts through fine-tuning. Leveraging the LIMA dataset as our foundational corpus, we employ GPT-4 to annotate role-play prompts for each data points, resulting in the creation of the LIMA-Role dataset. We then fine-tune LLMs like Llama-2-7B and Mistral-7B on LIMA-Role. Consequently, the self-prompt tuned LLMs can automatically generate expert role prompts for any given question. We extensively evaluate self-prompt tuned LLMs on widely used NLP benchmarks and open-ended question test. Our empirical results illustrate that self-prompt tuned LLMs outperform standard instruction tuned baselines across most datasets. This highlights the great potential of utilizing fine-tuning to enable LLMs to self-prompt, thereby automating complex prompting strategies. We release the dataset, models, and code at this \href{https://anonymous.4open.science/r/Self-Prompt-Tuning-739E/}{url}.
翻译:近期大型语言模型(LLMs)的进展展示了其卓越的角色扮演能力,能够根据不同指令和上下文准确模拟各类角色的对话风格与认知过程。研究表明,通过赋予LLMs专家角色(即角色扮演提示策略)可提升其在相应领域的表现。然而,针对特定问题需手动设计提示,这既需要专业知识又需迭代修改。为此,我们提出自提示调优方法,通过微调使LLMs自主生成角色扮演提示。基于LIMA数据集作为基础语料,我们利用GPT-4为每个数据点标注角色扮演提示,构建了LIMA-Role数据集。随后对Llama-2-7B和Mistral-7B等LLMs进行LIMA-Role微调。经自提示调优的LLMs能够针对任意问题自动生成专家角色提示。我们在广泛使用的NLP基准测试和开放式问题评估中系统评估了自提示调优LLMs的性能。实验结果表明,在多数数据集上自提示调优LLMs均优于标准指令微调基线模型。这凸显了通过微调实现LLMs自提示以自动化复杂提示策略的巨大潜力。相关数据集、模型及代码已发布于\href{https://anonymous.4open.science/r/Self-Prompt-Tuning-739E/}{此链接}。