We introduce A.X K1, a 519B-parameter Mixture-of-Experts (MoE) language model trained from scratch. Our design leverages scaling laws to optimize training configurations and vocabulary size under fixed computational budgets. A.X K1 is pre-trained on a corpus of approximately 10T tokens, curated by a multi-stage data processing pipeline. Designed to bridge the gap between reasoning capability and inference efficiency, A.X K1 supports explicitly controllable reasoning to facilitate scalable deployment across diverse real-world scenarios. We propose a simple yet effective Think-Fusion training recipe, enabling user-controlled switching between thinking and non-thinking modes within a single unified model. Extensive evaluations demonstrate that A.X K1 achieves performance competitive with leading open-source models, while establishing a distinctive advantage in Korean-language benchmarks.
翻译:我们介绍了 A.X K1,这是一个从头开始训练的、拥有 5190 亿参数的专家混合(MoE)语言模型。我们的设计利用缩放定律,在固定的计算预算下优化了训练配置和词汇表大小。A.X K1 在一个由多阶段数据处理流程筛选的、包含约 10 万亿词元的语料库上进行了预训练。该模型旨在弥合推理能力与推理效率之间的差距,支持显式可控的推理,以促进在不同现实场景中的可扩展部署。我们提出了一种简单而有效的 Think-Fusion 训练方案,使用户能够在单个统一模型内控制式地在“思考”与“非思考”模式之间切换。广泛的评估表明,A.X K1 实现了与领先开源模型相竞争的性能,同时在韩语基准测试中确立了独特的优势。