The limits of open-ended generative models are unclear, yet increasingly important. What causes them to succeed and what causes them to fail? In this paper, we take a prompt-centric approach to analyzing and bounding the abilities of open-ended generative models. We present a generic methodology of analysis with two challenging prompt constraint types: structural and stylistic. These constraint types are categorized into a set of well-defined constraints that are analyzable by a single prompt. We then systematically create a diverse set of simple, natural, and useful prompts to robustly analyze each individual constraint. Using the GPT-3 text-davinci-002 model as a case study, we generate outputs from our collection of prompts and analyze the model's generative failures. We also show the generalizability of our proposed method on other large models like BLOOM and OPT. Our results and our in-context mitigation strategies reveal open challenges for future research. We have publicly released our code at https://github.com/SALT-NLP/Bound-Cap-LLM.
翻译:开放式生成模型的能力边界尚不明确,但其重要性日益凸显。是什么因素导致其成功,又是什么因素导致其失败?本文采用以提示为中心的视角,分析和界定开放式生成模型的能力范畴。我们提出一种通用的分析方法论,涵盖两类具有挑战性的提示约束类型:结构约束与风格约束。这些约束类型被分解为一组可通过单条提示进行分析的精确定义约束。随后,我们系统性地构建了一组多样化、简单自然且实用的提示,以稳健评估各类约束。以GPT-3 text-davinci-002模型为案例,我们通过提示集合生成输出,分析模型的生成失败现象。同时,我们展示了该方法在BLOOM、OPT等其他大模型上的通用性。研究结果与上下文缓解策略揭示了未来研究面临的开放性挑战。相关代码已公开于https://github.com/SALT-NLP/Bound-Cap-LLM。