AI models (including LLM) often rely on narrative question-answering (QA) datasets to provide customized QA functionalities to support downstream children education applications; however, existing datasets only include QA pairs that are grounded within the given storybook content, but children can learn more when teachers refer the storybook content to real-world knowledge (e.g., commonsense knowledge). We introduce the FairytaleCQA dataset, which is annotated by children education experts, to supplement 278 storybook narratives with educationally appropriate commonsense knowledge. The dataset has 5,868 QA pairs that not only originate from the storybook narrative but also contain the commonsense knowledge grounded by an external knowledge graph (i.e., ConceptNet). A follow-up experiment shows that a smaller model (T5-large) fine-tuned with FairytaleCQA reliably outperforms much larger prompt-engineered LLM (e.g., GPT-4) in this new QA-pair generation task (QAG). This result suggests that: 1) our dataset brings novel challenges to existing LLMs, and 2) human experts' data annotation are still critical as they have much nuanced knowledge that LLMs do not know in the children educational domain.
翻译:人工智能模型(包括大语言模型)通常依赖叙事问答数据集来提供定制化的问答功能,以支持下游儿童教育应用。然而,现有数据集仅包含基于给定故事书内容的问答对,而教师在将故事书内容与现实世界知识(如常识知识)相联系时,儿童能学到更多。我们引入了由儿童教育专家标注的FairytaleCQA数据集,该数据集为278篇故事书叙事补充了教育上适当的常识知识。该数据集包含5,868个问答对,这些问答对不仅源自故事书叙事,还包含由外部知识图谱(如ConceptNet)支撑的常识知识。后续实验表明,使用FairytaleCQA微调的小型模型(T5-large)在这一新的问答对生成任务(QAG)中,可靠地超越了通过提示工程设计的更大规模大语言模型(如GPT-4)。这一结果表明:1)我们的数据集为现有大语言模型带来了新的挑战;2)人类专家的数据标注仍然至关重要,因为他们在儿童教育领域拥有大语言模型所不具备的细微知识。