This paper presents a systematic benchmark of state-of-the-art multilingual large language models (LLMs) adapted via token pruning - a compression technique that eliminates tokens and embedding parameters corresponding to languages irrelevant to the target application. Focusing on Korean-centric natural language processing (NLP) tasks, we evaluate architectures including Qwen3, Gemma-3, Llama-3, and Aya across three vocabulary configurations: Original, English-Korean (EnKo), and English-Korean-Chinese (EnKoZh). Performance is assessed using established benchmarks for general aptitude, cultural literacy, instruction following, and machine translation. Our findings indicate that token pruning significantly improves generation stability by eliminating language confusion, and in the case of machine translation, frequently enhances performance on Korean-specific tasks. While instruction-following capabilities display architecture-dependent variance linked to latent cross-lingual representations, the significant reduction in vocabulary size validates token pruning as a highly effective optimization strategy for memory-constrained, domain-specific deployments, despite modest gains in inference latency.
翻译:本文系统性地评估了通过令牌剪枝(一种压缩技术,通过移除与目标应用无关的语言对应的令牌和嵌入参数)适配的最先进多语言大语言模型。聚焦以韩语为中心的自然语言处理任务,我们评估了包括Qwen3、Gemma-3、Llama-3和Aya在内的架构在三种词汇配置下的表现:原始配置、英韩配置和英韩中配置。使用通用能力、文化素养、指令遵循和机器翻译等成熟基准进行评估。研究结果表明,令牌剪枝通过消除语言混淆显著提升了生成稳定性;在机器翻译任务中,它通常能增强韩语特定任务的性能。尽管指令遵循能力表现出与潜在跨语言表征相关的架构依赖性差异,但词汇量的显著减少验证了令牌剪枝作为一种针对内存受限、领域特定部署的高效优化策略的有效性,尽管推理延迟仅有少量提升。