Large language models (LLMs) have transformed many areas of natural language processing, including machine translation. However, efficient deployment of LLMs remains challenging due to their intensive computational requirements. In this paper, we address this challenge and present our submissions to the Model Compression track at the Conference on Machine Translation (WMT 2025). In our experiments, we investigate iterative layer pruning guided by layer importance analysis. We evaluate this method using the Aya-Expanse-8B model for translation from Czech to German, and from English to Egyptian Arabic. Our approach achieves substantial reductions in model size and inference time, while maintaining the translation quality of the baseline models.
翻译:大型语言模型(LLMs)已彻底改变了自然语言处理的诸多领域,包括机器翻译。然而,由于LLMs对计算资源的高度需求,其高效部署仍面临挑战。本文针对这一挑战,介绍了我们向机器翻译会议(WMT 2025)模型压缩赛道提交的方案。在实验中,我们研究了基于层重要性分析引导的迭代层剪枝方法。我们使用Aya-Expanse-8B模型,在捷克语到德语以及英语到埃及阿拉伯语的翻译任务上评估了该方法。我们的方法在保持基线模型翻译质量的同时,显著降低了模型规模与推理时间。