This study evaluates the pedagogical viability of LLM-generated English as a Foreign Language (EFL) learning content. Utilising log data from Japanese junior high school students practicing on a grammar drilling application, we analysed how different question modalities impact student performance and whether theoretical localised CEFR difficulty tiers accurately predict empirical task difficulty. Results reveal a clear performance hierarchy: multiple-choice questions carried the lowest cognitive load, cloze tasks posed the greatest barrier to active recall, and drag-and-drop exercises incurred the heaviest time penalties. Furthermore, learner data validated the CEFR-J grammar framework, showing a steady decline in accuracy and increased response times as proficiency levels advanced. These findings demonstrate that LLMs can successfully generate learning content, while highlighting the need for developers to strategically sequence question modalities to transition learners from passive recognition to active linguistic production.
翻译:本研究评估了大语言模型(LLM)生成的英语作为外语(EFL)学习内容的教学可行性。通过利用日本初中生在语法练习应用程序中产生的日志数据,我们分析了不同题目模态如何影响学生表现,以及基于理论的地方化CEFR难度等级是否能准确预测经验性任务难度。结果显示出一套清晰的成绩等级:选择题承载最低的认知负荷,完形填空任务对主动回忆构成最大障碍,而拖拽练习则导致最严重的时间消耗。此外,学习者数据验证了CEFR-J语法框架,显示出随着熟练水平提升,准确率持续下降且反应时间增加。这些发现证明了大语言模型能够成功生成学习内容,同时强调了开发者需要策略性地安排题目模态顺序,以引导学习者从被动识别过渡到主动语言产出。