Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks. While understanding why CoT prompting is effective is crucial to ensuring that this phenomenon is a consequence of desired model behavior, little work has addressed this; nonetheless, such an understanding is a critical prerequisite for responsible model deployment. We address this question by leveraging gradient-based feature attribution methods which produce saliency scores that capture the influence of input tokens on model output. Specifically, we probe several open-source LLMs to investigate whether CoT prompting affects the relative importances they assign to particular input tokens. Our results indicate that while CoT prompting does not increase the magnitude of saliency scores attributed to semantically relevant tokens in the prompt compared to standard few-shot prompting, it increases the robustness of saliency scores to question perturbations and variations in model output.
翻译:思维链提示已被实验证明能提升大语言模型在各类问答任务上的准确性。尽管理解思维链提示的有效机制对于确保该现象源自预期模型行为至关重要,但相关研究尚显不足;然而,这种理解是负责任模型部署的关键前提。我们通过利用基于梯度的特征归因方法(可生成反映输入标记对模型输出影响力的显著性分数)来探究此问题。具体而言,我们对多个开源大语言模型进行探测,分析思维链提示是否影响模型对特定输入标记的相对重要性分配。结果表明:与标准少样本提示相比,思维链提示虽未增加语义相关标记的显著性分数幅值,但能提升显著性分数对问题扰动及模型输出变化的鲁棒性。