We introduce A.X K1, a 519B-parameter Mixture-of-Experts (MoE) language model trained from scratch. Our design leverages scaling laws to optimize training configurations and vocabulary size under fixed computational budgets. A.X K1 is pre-trained on a corpus of approximately 10T tokens, curated by a multi-stage data processing pipeline. Designed to bridge the gap between reasoning capability and inference efficiency, A.X K1 supports explicitly controllable reasoning to facilitate scalable deployment across diverse real-world scenarios. We propose a simple yet effective Think-Fusion training recipe, enabling user-controlled switching between thinking and non-thinking modes within a single unified model. Extensive evaluations demonstrate that A.X K1 achieves performance competitive with leading open-source models, while establishing a distinctive advantage in Korean-language benchmarks.
翻译:我们介绍 A.X K1,一个从头开始训练的 5190 亿参数专家混合(MoE)语言模型。我们的设计利用缩放定律,在固定的计算预算下优化训练配置和词汇表大小。A.X K1 在一个约 10 万亿词元的语料库上进行预训练,该语料库通过一个多阶段数据处理流程筛选而成。该模型旨在弥合推理能力与推理效率之间的差距,支持显式可控的推理,以促进在不同现实场景中的可扩展部署。我们提出了一种简单而有效的 Think-Fusion 训练方法,使用户能够在单个统一模型内可控地切换思考模式与非思考模式。广泛的评估表明,A.X K1 实现了与领先开源模型相竞争的性能,同时在韩语基准测试中确立了独特的优势。