Real-world business applications require a trade-off between language model performance and size. We propose a new method for model compression that relies on vocabulary transfer. We evaluate the method on various vertical domains and downstream tasks. Our results indicate that vocabulary transfer can be effectively used in combination with other compression techniques, yielding a significant reduction in model size and inference time while marginally compromising on performance.
翻译:实际商业应用需要在语言模型性能与模型大小之间进行权衡。我们提出了一种基于词汇迁移的模型压缩新方法,并在多个垂直领域及下游任务上对其进行了评估。实验结果表明,词汇迁移可与其他压缩技术有效结合,在显著降低模型大小和推理时间的同时,仅带来轻微的性能损失。