Performance of large language models (LLMs) may vary with different prompts or instructions of even the same task. One commonly recognized factor for this phenomenon is the model's familiarity with the given prompt or instruction, which is typically estimated by its perplexity. However, finding the prompt with the lowest perplexity is challenging, given the enormous space of possible prompting phrases. In this paper, we propose monotonic paraphrasing (MonoPara), an end-to-end decoding strategy that paraphrases given prompts or instructions into their lower perplexity counterparts based on an ensemble of a paraphrase LM for prompt (or instruction) rewriting, and a target LM (i.e. the prompt or instruction executor) that constrains the generation for lower perplexity. The ensemble decoding process can efficiently paraphrase the original prompt without altering its semantic meaning, while monotonically decreasing the perplexity of each generation as calculated by the target LM. We explore in detail both greedy and search-based decoding as two alternative decoding schemes of MonoPara. Notably, MonoPara does not require any training and can monotonically lower the perplexity of the paraphrased prompt or instruction, leading to improved performance of zero-shot LM prompting as evaluated on a wide selection of tasks. In addition, MonoPara is also shown to effectively improve LMs' generalization on perturbed and unseen task instructions.
翻译:大型语言模型(LLMs)的性能可能因同一任务的不同提示或指令而有所差异。这一现象的一个公认成因是模型对给定提示或指令的熟悉程度,通常通过其困惑度来评估。然而,在庞大的提示短语空间中寻找最低困惑度的提示极具挑战性。本文提出单调释义(MonoPara)——一种端到端解码策略,通过集成释义语言模型(用于重写提示或指令)与目标语言模型(即提示或指令执行器,用于约束生成以降低困惑度),将给定提示或指令释义为其低困惑度版本。该集成解码过程可在不改变语义的前提下高效改写原始提示,同时单调降低目标语言模型计算的每次生成的困惑度。我们详细探究了贪婪解码和基于搜索的解码作为MonoPara的两种可选方案。值得注意的是,MonoPara无需任何训练即可单调降低释义后提示或指令的困惑度,从而在广泛任务中提升零样本语言模型提示的性能。此外,MonoPara还能有效增强语言模型对扰动及未见任务指令的泛化能力。