Large Language Models (LLMs) have gained widespread popularity due to their ability to perform ad-hoc Natural Language Processing (NLP) tasks with a simple natural language prompt. Part of the appeal for LLMs is their approachability to the general public, including individuals with no prior technical experience in NLP techniques. However, natural language prompts can vary significantly in terms of their linguistic structure, context, and other semantics. Modifying one or more of these aspects can result in significant differences in task performance. Non-expert users may find it challenging to identify the changes needed to improve a prompt, especially when they lack domain-specific knowledge and lack appropriate feedback. To address this challenge, we present PromptAid, a visual analytics system designed to interactively create, refine, and test prompts through exploration, perturbation, testing, and iteration. PromptAid uses multiple, coordinated visualizations which allow users to improve prompts by using the three strategies: keyword perturbations, paraphrasing perturbations, and obtaining the best set of in-context few-shot examples. PromptAid was designed through an iterative prototyping process involving NLP experts and was evaluated through quantitative and qualitative assessments for LLMs. Our findings indicate that PromptAid helps users to iterate over prompt template alterations with less cognitive overhead, generate diverse prompts with help of recommendations, and analyze the performance of the generated prompts while surpassing existing state-of-the-art prompting interfaces in performance.
翻译:大语言模型因其能够通过简单的自然语言提示执行临时自然语言处理任务而广受欢迎。LLMs的吸引力部分在于其对普通大众的易用性,包括不具备NLP技术先验经验的个人。然而,自然语言提示在语言结构、上下文及其他语义方面可能存在显著差异。修改这些方面中的一个或多个可能导致任务性能出现巨大差异。非专业用户在识别改进提示所需的变更时可能面临挑战,尤其是当他们缺乏领域特定知识且缺乏适当反馈时。为应对这一挑战,我们提出PromptAid——一个通过探索、扰动、测试与迭代实现提示交互式创建、优化和测试的可视分析系统。PromptAid采用多个协同工作的可视化界面,使用户能够通过三种策略改善提示:关键词扰动、释义扰动以及获取最优上下文少样本示例集。通过涉及NLP专家的迭代原型设计过程,我们完成了PromptAid的设计,并针对LLMs进行了定量与定性评估。研究结果表明,PromptAid能帮助用户以更低的认知开销迭代修改提示模板,借助推荐生成多样化提示,分析生成提示的性能表现,并在性能上超越现有的最先进提示交互界面。