Language models (LMs) excel at tasks across diverse domains, yet require substantial computational resources during inference. Compression techniques such as pruning and quantization offer a practical path towards efficient LM deployment, exemplified by their ability to preserve performance on general-purpose benchmarks. However, general-purpose LM compression methods can negatively affect performance in specialized domains (e.g. biomedical or legal). Recent work has sought to address this issue, but requires a computationally expensive full-parameter fine-tuning pipeline. To this end, we propose MixCal, a novel calibration method designed to improve the in-domain performance of compressed LMs in a post-training setting. Through extensive experimentation, we demonstrate that MixCal substantially outperforms existing approaches on domain-specific tasks and preserves general performance. Notably, these performance gains are achieved while also reducing the computational cost of LM compression.
翻译:语言模型(LMs)在跨多个领域的任务中表现出色,但在推理阶段需要大量的计算资源。剪枝和量化等压缩技术为实现语言模型的高效部署提供了一条实用途径,其能力体现在能够保持模型在通用基准测试上的性能。然而,通用语言模型压缩方法可能会对专业领域(例如生物医学或法律领域)的性能产生负面影响。最近的研究试图解决这一问题,但需要计算成本高昂的全参数微调流程。为此,我们提出了MixCal,一种新颖的校准方法,旨在后训练场景下提升压缩语言模型在领域内的性能。通过大量实验,我们证明MixCal在领域特定任务上显著优于现有方法,并保持了通用性能。值得注意的是,在实现这些性能提升的同时,该方法还降低了语言模型压缩的计算成本。