Large Language Models increasingly rely on self-explanations, such as chain of thought reasoning, to improve performance on multi step question answering. While these explanations enhance accuracy, they are often verbose and costly to generate, raising the question of how much explanation is truly necessary. In this paper, we examine the trade-off between sufficiency, defined as the ability of an explanation to justify the correct answer, and conciseness, defined as the reduction in explanation length. Building on the information bottleneck principle, we conceptualize explanations as compressed representations that retain only the information essential for producing correct answers.To operationalize this view, we introduce an evaluation pipeline that constrains explanation length and assesses sufficiency using multiple language models on the ARC Challenge dataset. To broaden the scope, we conduct experiments in both English, using the original dataset, and Persian, as a resource-limited language through translation. Our experiments show that more concise explanations often remain sufficient, preserving accuracy while substantially reducing explanation length, whereas excessive compression leads to performance degradation.
翻译:大型语言模型日益依赖自解释机制(如思维链推理)来提升多步问答任务的性能。尽管这些解释能提高准确性,但其通常冗长且生成成本高昂,这引发了对解释必要程度的思考。本文研究了充分性(即解释能否证明正确答案的合理性)与简洁性(即解释长度的缩减)之间的权衡关系。基于信息瓶颈原理,我们将解释概念化为压缩表征,仅保留对生成正确答案至关重要的信息。为实践这一观点,我们构建了一个评估流程:在ARC Challenge数据集上,通过约束解释长度并利用多个语言模型评估充分性。为拓展研究范围,我们分别以英语(使用原始数据集)和波斯语(作为资源受限语言通过翻译实现)进行了实验。实验表明,更简洁的解释往往仍能保持充分性,在显著缩短解释长度的同时维持准确率;而过度压缩则会导致性能下降。