We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small language models are trained as domain experts, LM-Lexicon achieves substantial improvements (+7% BLEU score compared with the prior state-of-the-art model) over existing methods on five widely used benchmarks. Empirically, we demonstrate that 1) the clustering strategy enables fine-grained expert specialization with nearly 10% improvement in definition quality; 2) the semantic-aware domain-level routing mechanism achieves higher expert efficacy (+1%) than conventional token-level routing; and 3) further performance gains can be obtained through test-time compute and semantic expert scaling. Our work advances definition modeling while providing insights into the development of efficient language models for semantic-intensive applications.
翻译:我们提出了LM-Lexicon,一种创新的定义建模方法,它结合了数据聚类、语义专家学习以及基于稀疏专家混合架构的模型融合。通过将定义建模任务分解为专门的语义领域,并在其中训练小型语言模型作为领域专家,LM-Lexicon在五个广泛使用的基准测试上相比现有方法取得了显著提升(相较于先前的最优模型,BLEU分数提高了7%)。实证研究表明:1)聚类策略实现了细粒度的专家专业化,使定义质量提升近10%;2)语义感知的领域级路由机制相比传统的词元级路由获得了更高的专家效能(+1%);3)通过测试时计算与语义专家扩展可进一步获得性能提升。我们的工作推动了定义建模的发展,同时为开发面向语义密集型应用的高效语言模型提供了见解。