This technical report presents Yi-Lightning, our latest flagship large language model (LLM). It achieves exceptional performance, ranking 6th overall on Chatbot Arena, with particularly strong results (2nd to 4th place) in specialized categories including Chinese, Math, Coding, and Hard Prompts. Yi-Lightning leverages an enhanced Mixture-of-Experts (MoE) architecture, featuring advanced expert segmentation and routing mechanisms coupled with optimized KV-caching techniques. Our development process encompasses comprehensive pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), where we devise deliberate strategies for multi-stage training, synthetic data construction, and reward modeling. Furthermore, we implement RAISE (Responsible AI Safety Engine), a four-component framework to address safety issues across pre-training, post-training, and serving phases. Empowered by our scalable super-computing infrastructure, all these innovations substantially reduce training, deployment and inference costs while maintaining high-performance standards. With further evaluations on public academic benchmarks, Yi-Lightning demonstrates competitive performance against top-tier LLMs, while we observe a notable disparity between traditional, static benchmark results and real-world, dynamic human preferences. This observation prompts a critical reassessment of conventional benchmarks' utility in guiding the development of more intelligent and powerful AI systems for practical applications. Yi-Lightning is now available through our developer platform at https://platform.lingyiwanwu.com.
翻译:本技术报告介绍了我们最新的旗舰大型语言模型 Yi-Lightning。该模型取得了卓越的性能,在 Chatbot Arena 总榜上位列第6,并在中文、数学、编程及困难提示等特定类别中表现尤为突出(位列第2至第4名)。Yi-Lightning 采用了增强的混合专家模型架构,具备先进的专家分割与路由机制,并结合了优化的键值缓存技术。我们的开发流程涵盖了全面的预训练、监督微调以及基于人类反馈的强化学习,在此过程中我们为多阶段训练、合成数据构建和奖励模型设计制定了周密的策略。此外,我们实施了 RAISE(负责任人工智能安全引擎),这是一个四组件框架,旨在应对预训练、后训练及服务阶段的安全问题。在我们可扩展的超算基础设施支持下,所有这些创新在保持高性能标准的同时,显著降低了训练、部署和推理成本。通过对公开学术基准的进一步评估,Yi-Lightning 在与顶级大型语言模型的对比中展现了有竞争力的性能,同时我们观察到传统的静态基准测试结果与真实世界中动态的人类偏好之间存在显著差异。这一观察促使我们重新审视传统基准在指导开发更智能、更强大且适用于实际应用的人工智能系统方面的效用。Yi-Lightning 现已通过我们的开发者平台 https://platform.lingyiwanwu.com 提供使用。