Large language models (LLMs) can be seen as atomic units of computation mapping sequences to a distribution over sequences. Thus, they can be seen as stochastic language layers in a language network, where the learnable parameters are the natural language prompts at each layer. By stacking two such layers and feeding the output of one layer to the next, we obtain a Deep Language Network (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1). Then, we present an extension that applies to 2-layer DLNs (DLN-2), where two prompts must be learned. The key idea is to consider the output of the first layer as a latent variable, which requires inference, and prompts to be learned as the parameters of the generative distribution. We first test the effectiveness of DLN-1 in multiple reasoning and natural language understanding tasks. Then, we show that DLN-2 can reach higher performance than a single layer, showing promise that we might reach comparable performance to GPT-4, even when each LLM in the network is smaller and less powerful.
翻译:大型语言模型(LLMs)可视为将序列映射到序列分布上的原子计算单元。因此,它们可被视为语言网络中的随机语言层,其中可学习参数为每层的自然语言提示。通过堆叠两个这样的层并将一层的输出馈送至下一层,我们获得深度语言网络(DLN)。我们首先展示如何有效对单层语言网络(DLN-1)进行提示优化,随后提出适用于双层DLN(DLN-2)的扩展方法——此时需学习两个提示。核心思想是将第一层输出视为需要推断的潜变量,并将待学习的提示视为生成分布的参数。我们首先在多个推理与自然语言理解任务中验证DLN-1的有效性,进而证明DLN-2能达到比单层更高的性能,这表明即使网络中的每个LLM规模较小且能力较弱,我们仍可能达到与GPT-4相当的性能。