Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression

Large language models (LLMs) are increasingly deployed in real-world scenarios with the help of recent model compression techniques. Such momentum towards local deployment means the use of compressed LLMs will widely impact a large population. However, prior analysis works often prioritize on preserving perplexity which is a direct analogy to training loss. The impact of compression method on other critical aspects of model behavior, particularly safety, still calls for a systematic assessment. To this end, we investigate the impact of model compression on four dimensions: (1) degeneration harm, i.e., bias and toxicity in generation; (2) representational harm, i.e., biases in discriminative tasks; (3) dialect bias; (4) language modeling and downstream task performance. We cover a wide spectrum of LLM compression techniques, including unstructured pruning, semi-structured pruning and quantization. Our analysis reveals that compression can lead to unexpected consequences. Although compression may unintentionally remedy LLMs' degeneration harm, it can still exacerbate on the representational harm axis. Although compression may unintentionally remedy LLMs' degeneration harm, it can still exacerbate on the representational harm axis. Moreover, there is a divergent impact on different protected groups as the compression rate grows. Finally, different compression methods have drastically different safety impacts, e.g., quantization mostly preserves bias while pruning degrades quickly. Our findings underscore the importance of integrating safety assessments into the development of compressed LLMs to ensure their reliability across real-world applications. Our full results are available here: \url{https://github.com/zhichaoxu-shufe/Beyond-Perplexity-Compression-Safety-Eval}

翻译：随着近期模型压缩技术的应用，大语言模型（LLMs）正越来越多地部署于现实场景中。这种向本地部署发展的趋势意味着压缩后的大语言模型将广泛影响大量用户。然而，先前的分析工作通常优先考虑保持困惑度，这直接类比于训练损失。压缩方法对模型行为其他关键方面（尤其是安全性）的影响，仍需要进行系统评估。为此，我们研究了模型压缩对四个维度的影响：（1）退化性危害，即生成内容中的偏见与毒性；（2）表征性危害，即判别性任务中的偏见；（3）方言偏见；（4）语言建模与下游任务性能。我们涵盖了广泛的大语言模型压缩技术，包括非结构化剪枝、半结构化剪枝和量化。我们的分析表明，压缩可能导致意想不到的后果。尽管压缩可能无意中缓解大语言模型的退化性危害，但它仍可能加剧表征性危害。此外，随着压缩率的增加，对不同受保护群体的影响存在差异。最后，不同的压缩方法对安全性的影响截然不同，例如，量化大多能保持偏见，而剪枝则性能下降较快。我们的发现强调了将安全性评估整合到压缩大语言模型开发中的重要性，以确保其在现实应用中的可靠性。完整结果可见：\url{https://github.com/zhichaoxu-shufe/Beyond-Perplexity-Compression-Safety-Eval}