System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating instruction stability via self-chats between two instructed chatbots. Testing popular models like LLaMA2-chat-70B and GPT-3.5, we reveal a significant instruction drift within eight rounds of conversations. An empirical and theoretical analysis of this phenomenon suggests the transformer attention mechanism plays a role, due to attention decay over long exchanges. To combat attention decay and instruction drift, we propose a lightweight method called split-softmax, which compares favorably against two strong baselines.
翻译:系统提示是定制语言模型聊天机器人的标准工具,使其能够遵循特定指令。使用系统提示的一个隐含假设是这些提示具有稳定性,因此聊天机器人将在整个对话过程中根据规定指令持续生成文本。我们提出一个定量基准来检验这一假设,通过两个受指令引导的聊天机器人之间的自我对话评估指令稳定性。测试LLaMA2-chat-70B和GPT-3.5等流行模型时,我们发现对话在八轮内即出现显著的指令漂移。对该现象的实证与理论分析表明,由于长程对话中的注意力衰减,变压器注意力机制在其中发挥作用。为解决注意力衰减与指令漂移问题,我们提出一种轻量级方法——分裂softmax,该方法在与两个强基线的比较中表现优越。