We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.
翻译:我们推出 Ministral 3 系列,这是一个专为计算和内存受限应用设计的参数高效稠密语言模型家族,提供三种模型规模:30亿、80亿和140亿参数。针对每种模型规模,我们发布了三个变体:一个用于通用目的的预训练基础模型、一个经过指令微调的模型,以及一个用于复杂问题求解的推理模型。此外,我们介绍了通过级联蒸馏(Cascade Distillation)——一种结合迭代剪枝与持续蒸馏训练的迭代技术——来推导 Ministral 3 模型的完整方法。所有模型均具备图像理解能力,并在 Apache 2.0 许可证下发布。