The rapid advancement and impressive capabilities of large language models (LLMs) have given rise to the field of prompt engineering, the practice of crafting inputs to guide LLMs toward high-quality, task-relevant outputs. A critical challenge facing the field is the lack of standardised prompt documentation and evaluation practices. Prompts can be long, complex and difficult to evaluate on subjective tasks. To address this challenge, we propose the use of prompt cards, structured summaries of prompt engineering practices inspired by the concept of model cards. Through prompt cards, the specific goals, considerations and steps taken during prompt engineering can be systematically documented and assessed. We present the prompt card approach and illustrate it on a specific task called wordalisation, in which structured numerical data is transformed into text. We argue that a well-structured prompt card can enable better reproducibility, transparency, improve prompt methodology and give an effective alternative to benchmarking for judging the quality of generated texts. By systemically capturing underlying model details, prompt intent, contextualisation strategies, evaluation practices and ethical considerations, prompt cards make explicit the often implicit design decisions that shape system behaviour. Documenting these choices is important as prompting increasingly involves complex pipelines with multiple moving parts.
翻译:大型语言模型(LLM)的快速发展和卓越能力催生了提示工程领域,即通过精心设计输入内容来引导LLM生成高质量且与任务相关的输出。该领域面临的核心挑战在于缺乏标准化的提示文档记录与评估实践。提示内容可能冗长复杂,且难以在主观性任务中进行评估。为应对这一挑战,我们提出采用提示卡——一种受模型卡概念启发、用于结构化总结提示工程实践的文档形式。通过提示卡,可以系统化地记录和评估提示工程过程中的具体目标、考量因素及实施步骤。本文介绍了提示卡方法,并以"数据文本化"任务为例进行说明,该任务旨在将结构化数值数据转换为文本。我们认为,结构良好的提示卡能够提升实验可复现性与透明度,改进提示方法论,并为评估生成文本质量提供基准测试之外的有效替代方案。通过系统化记录底层模型细节、提示意图、情境化策略、评估实践及伦理考量,提示卡能够将影响系统行为的隐性设计决策显性化。随着提示工程日益涉及包含多个动态组件的复杂流程,记录这些选择变得尤为重要。