When LLMs perform zero-shot inference, they typically use a prompt with a task specification, and generate a completion. However, there is no work to explore the possibility of the reverse - going from completion to task specification. In this paper, we employ both directions to perform cycle-supervised learning entirely in-context. Our goal is to create a forward map f : X -> Y (e.g. image -> generated caption), coupled with a backward map g : Y -> X (e.g. caption -> generated image) to construct a cycle-consistency "loss" (formulated as an update to the prompt) to enforce g(f(X)) ~= X. The technique, called CyclePrompt, uses cycle-consistency as a free supervisory signal to iteratively craft the prompt. Importantly, CyclePrompt reinforces model performance without expensive fine-tuning, without training data, and without the complexity of external environments (e.g. compilers, APIs). We demonstrate CyclePrompt in two domains: code generation and image captioning. Our results on the HumanEval coding benchmark put us in first place on the leaderboard among models that do not rely on extra training data or usage of external environments, and third overall. Compared to the GPT4 baseline, we improve accuracy from 80.5% to 87.2%. In the vision-language space, we generate detailed image captions which outperform baseline zero-shot GPT4V captions, when tested against natural (VQAv2) and diagrammatic (FigureQA) visual question-answering benchmarks. To the best of our knowledge, this is the first use of self-supervised learning for prompting.
翻译:当大语言模型执行零样本推理时,通常使用包含任务规范的提示并生成补全内容。然而,现有研究尚未探索反向过程——从补全内容推导任务规范的可能性。本文采用双向过程,在上下文内完全执行周期监督学习。我们旨在构建前向映射f: X -> Y(如图像→生成字幕),同时结合反向映射g: Y -> X(如字幕→生成图像),通过周期一致性"损失"(以提示更新形式实现)确保g(f(X)) ≈ X。该技术名为CyclePrompt,将周期一致性作为无监督信号迭代优化提示。关键优势在于:无需昂贵微调、无需训练数据、无需外部环境(如编译器、API)的复杂性。我们在代码生成和图像字幕两个领域验证CyclePrompt。在HumanEval编码基准测试中,该方法在不依赖额外训练数据或外部环境的模型中位列榜首,总体排名第三。相比GPT4基线,准确率从80.5%提升至87.2%。在视觉-语言领域,我们生成的详细图像字幕在自然图像问答(VQAv2)和图表问答(FigureQA)基准测试中均超越零样本GPT4V基线。据我们所知,这是自监督学习首次应用于提示优化。