Despite rapid progress in LLM-based code generation, existing models are predominantly trained on imperative languages, leaving functional programming languages (FPLs) such as Haskell, OCaml, and Scala chronically underexplored, with even frontier models performing substantially worse on FPLs. Fine-tuning is a natural remedy, but our experiments show that per-language fine-tuning fails to capture shared functional abstractions, while merged multi-language fine-tuning introduces cross-language interference. To address this, we introduce FPMoE, a lightweight, open-source code generation model built on a sparse Mixture-of-Experts (MoE) architecture with three language-specific routed experts (one each for Haskell, OCaml, and Scala) and a shared expert that captures cross-language functional patterns such as monadic reasoning and type-directed programming. This design resolves both failure modes simultaneously: dedicated experts eliminate interference, while the shared expert preserves abstractions that per-language models miss. On FPEval, FPMoE substantially outperforms fine-tuned baselines and, with only 3B active parameters, matches the performance of much larger models including DeepSeek-Coder-6.7B, Qwen2.5-Coder-14B-Instruct, and Qwen3-Coder-30B-A3B.
翻译:尽管基于大语言模型的代码生成技术取得了快速进展,现有模型主要针对命令式语言进行训练,导致Haskell、OCaml和Scala等函数式编程语言长期处于探索不足的状态,即便是前沿模型在函数式编程语言上的表现也显著较差。微调是一种自然的解决方案,但实验表明,针对单个语言的微调无法捕捉共享的函数式抽象,而多语言合并微调则会引起跨语言干扰。为解决这一问题,我们提出FPMoE——一种轻量级、开源的代码生成模型,它基于稀疏混合专家架构构建,包含三个专用于特定语言的路由专家(分别针对Haskell、OCaml和Scala)以及一个共享专家,用于捕获跨语言的函数式模式(如单子推理和类型导向编程)。该设计同时消除了两种故障模式:专用专家避免了干扰,共享专家则保留了单语言模型遗漏的抽象。在FPEval上,FPMoE显著优于微调基线模型,并且仅使用3B活跃参数,即可与包括DeepSeek-Coder-6.7B、Qwen2.5-Coder-14B-Instruct和Qwen3-Coder-30B-A3B在内的更大规模模型性能匹敌。