EngGPT2-16B-A3B is the latest iteration of Engineering Group's Italian LLM and it's built to be a Sovereign, Efficient and Open model. EngGPT2 is trained on 2.5 trillion tokens - less than Qwen3's 36T or Llama3's 15T - and delivers performance on key benchmarks, including MMLU-Pro, GSM8K, IFEval and HumanEval, comparable to dense models in the 8B-16B range, while requiring one-fifth to half of the inference power, and between one-tenth to one-sixth of the training data and consequent needed training power. Designed as a trained-from-scratch Mixture-of-Experts (MoE) architecture, EngGPT2 features 16 billion parameters with 3 billion active per inference, with expert sizes positioned between those used in GPT-OSS and Qwen3. Approximately 25% of its training corpus consists of Italian-language data, to deliver strong capabilities for European and Italian NLP tasks among models of similar scale. This efficiency aims to position EngGPT2 as a key contributor to the growing portfolio of open-weight European models, combining performance and efficiency with full alignment to the EU AI Act. EngGPT2 is also a single model capable of multiple reasoning modes: non-reasoning, reasoning in Italian or English, and turbo-reasoning (a concise, bullet-point style reasoning available in both languages designed for real-time reasoning use cases). EngGPT2 aims to set a new standard for resource-conscious, high-performance LLMs tailored to European and Italian contexts.
翻译:EngGPT2-16B-A3B是Engineering Group意大利大语言模型的最新迭代版本,旨在构建一个自主、高效且开放的模型。EngGPT2基于2.5万亿token进行训练(少于Qwen3的36万亿和Llama3的15万亿),在MMLU-Pro、GSM8K、IFEval和HumanEval等关键基准测试中展现出与8B-16B参数规模稠密模型相当的性能,同时仅需1/5至1/2的推理算力,以及1/10至1/6的训练数据及相应训练算力。该模型采用从头训练的混合专家架构,拥有160亿参数且每次推理激活30亿参数,其专家模块规模介于GPT-OSS与Qwen3之间。训练语料中约25%为意大利语数据,使其在同等规模模型中具备出色的欧洲及意大利自然语言处理任务能力。这种高效性旨在使EngGPT2成为日益增长的欧洲开放权重模型家族的关键成员,在实现性能与效率平衡的同时完全符合《欧盟人工智能法案》。EngGPT2还具备多模态推理能力:支持非推理模式、意大利语/英语推理模式以及涡轮推理模式(一种适用于实时推理场景的双语简明要点式推理)。该模型致力于为欧洲及意大利语境下的资源节约型高性能大语言模型树立新标杆。