We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performance. Benchmark evaluations suggest that PLaMo-100B performs well, particularly in Japanese-specific tasks, achieving results that are competitive with frontier models like GPT-4. The base model is available at https://huggingface.co/pfnet/plamo-100b.
翻译:本文介绍PLaMo-100B,一个专为日语能力设计的大规模语言模型。该模型从零开始训练,使用了2万亿个词元,并采用了QK Normalization和Z-Loss等架构以确保训练过程的稳定性。通过应用监督微调和直接偏好优化等后训练技术,进一步提升了模型的性能。基准评估表明,PLaMo-100B表现良好,尤其在日语特定任务上,其成果可与GPT-4等前沿模型相媲美。基础模型可在 https://huggingface.co/pfnet/plamo-100b 获取。