Performance of large language models (LLMs) may vary with different prompts or instructions of even the same task. One commonly recognized factor for this phenomenon is the model's familiarity with the given prompt or instruction, which is typically estimated by its perplexity. However, finding the prompt with the lowest perplexity is challenging, given the enormous space of possible prompting phrases. In this paper, we propose monotonic paraphrasing (MonoPara), an end-to-end decoding strategy that paraphrases given prompts or instructions into their lower perplexity counterparts based on an ensemble of a paraphrase LM for prompt (or instruction) rewriting, and a target LM (i.e. the prompt or instruction executor) that constrains the generation for lower perplexity. The ensemble decoding process can efficiently paraphrase the original prompt without altering its semantic meaning, while monotonically decreasing the perplexity of each generation as calculated by the target LM. We explore in detail both greedy and search-based decoding as two alternative decoding schemes of MonoPara. Notably, MonoPara does not require any training and can monotonically lower the perplexity of the paraphrased prompt or instruction, leading to improved performance of zero-shot LM prompting as evaluated on a wide selection of tasks. In addition, MonoPara is also shown to effectively improve LMs' generalization on perturbed and unseen task instructions.
翻译:大语言模型(LLMs)在执行同一任务时,其性能可能因不同的提示或指令而有所差异。这一现象的常见归因因素是模型对给定提示或指令的熟悉程度,通常通过困惑度进行估计。然而,由于可能的提示短语空间巨大,找到最低困惑度的提示具有挑战性。本文提出单调释义(MonoPara)——一种端到端解码策略,通过集成释义语言模型(用于改写提示或指令)和目标语言模型(即提示或指令执行器,约束生成过程以降低困惑度),将给定提示或指令释义为具有更低困惑度的版本。该集成解码过程能在不改变原始提示语义的前提下高效完成释义,同时单调降低目标语言模型计算的每一步生成结果的困惑度。我们详细探讨了贪婪解码和基于搜索的解码作为MonoPara的两种替代方案。值得注意的是,MonoPara无需训练即可单调降低释义后提示或指令的困惑度,从而在广泛任务中提升零样本语言模型提示的性能。此外,实验表明MonoPara能有效改善语言模型对扰动和未见任务指令的泛化能力。