Quantization is widely used to accelerate inference and streamline the deployment of large language models (LLMs), yet its effects on self-explanations (SEs) remain unexplored. SEs, generated by LLMs to justify their own outputs, require reasoning about the model's own decision-making process, a capability that may exhibit particular sensitivity to quantization. As SEs are increasingly relied upon for transparency in high-stakes applications, understanding whether and to what extent quantization degrades SE quality and faithfulness is critical. To address this gap, we examine two types of SEs: natural language explanations (NLEs) and counterfactual examples, generated by LLMs quantized using three common techniques at distinct bit widths. Our findings indicate that quantization typically leads to moderate declines in both SE quality (up to 4.4\%) and faithfulness (up to 2.38\%). The user study further demonstrates that quantization diminishes both the coherence and trustworthiness of SEs (up to 8.5\%). Compared to smaller models, larger models show limited resilience to quantization in terms of SE quality but better maintain faithfulness. Moreover, no quantization technique consistently excels across task accuracy, SE quality, and faithfulness. Given that quantization's impact varies by context, we recommend validating SE quality for specific use cases, especially for NLEs, which show greater sensitivity. Nonetheless, the relatively minor deterioration in SE quality and faithfulness does not undermine quantization's effectiveness as a model compression technique.
翻译:量化被广泛用于加速推理和简化大规模语言模型(LLMs)的部署,但其对自解释(SEs)的影响尚未得到探索。自解释是LLMs为证明其自身输出而生成的解释,需要模型对其自身决策过程进行推理,这种能力可能对量化表现出特殊的敏感性。随着高风险应用日益依赖自解释来实现透明度,理解量化是否以及会在多大程度上降低自解释的质量和忠实度至关重要。为填补这一空白,我们研究了两种类型的自解释:自然语言解释(NLEs)和反事实示例,它们由使用三种常见量化技术在不同比特宽度下量化的LLMs生成。我们的研究结果表明,量化通常会导致自解释质量(下降高达4.4%)和忠实度(下降高达2.38%)的适度下降。用户研究进一步表明,量化降低了自解释的连贯性和可信度(下降高达8.5%)。与较小模型相比,较大模型在自解释质量方面对量化的鲁棒性有限,但能更好地保持忠实度。此外,没有一种量化技术在任务准确性、自解释质量和忠实度方面持续表现出色。鉴于量化的影响因情境而异,我们建议针对特定用例验证自解释质量,特别是对表现出更高敏感性的自然语言解释。尽管如此,自解释质量和忠实度的相对轻微恶化并不会削弱量化作为一种模型压缩技术的有效性。