Large Language Models (LLMs) are widely applied in educational practices, such as for generating children's stories. However, the generated stories are often too difficult for children to read, and the operational cost of LLMs hinders their widespread adoption in educational settings. We used an existing expert-designed children's reading curriculum and its corresponding generated stories from GPT-4o and Llama 3.3 70B to design different experiments for fine-tuning three 8B-parameter LLMs, which then generated new English reading stories that were subjected to quantitative and qualitative evaluation. Our method prioritizes controllability over scale, enabling educators to target reading levels and error patterns with a compact, affordable model. Our evaluation results show that with appropriate fine-tuning designs, children's English reading stories generated by 8B LLMs perform better on difficulty-related metrics than those from zero-shot GPT-4o and Llama 3.3 70B, with almost no discernible safety issues. Such fine-tuned LLMs could be more broadly used by teachers, parents, and children in classrooms and at home to generate engaging English reading stories with children's interests, controllable difficulty and safety.
翻译:大语言模型(LLMs)广泛应用于教育实践,例如生成儿童故事。然而,生成的故事往往对儿童而言阅读难度过高,且LLMs的运行成本阻碍了其在教育场景中的大规模应用。我们利用现有专家设计的儿童阅读课程及其对应的GPT-4o与Llama 3.3 70B生成故事,设计了多种实验来微调三个80亿参数级LLM,随后对生成的新英语阅读故事进行定量与定性评估。我们的方法优先考虑可控性而非规模,使教育工作者能够通过紧凑且经济的模型精准定位阅读水平与错误模式。评估结果表明,通过合适的微调设计,80亿参数LLM生成的儿童英语阅读故事在难度相关指标上优于零样本GPT-4o与Llama 3.3 70B,且几乎未发现可识别的安全问题。此类微调后的LLM可被教师、家长及儿童更广泛地应用于课堂与家庭场景,生成兼具儿童兴趣、可控难度与安全性的趣味英语阅读故事。