Large language models (LLMs) have shown the potential for generating educational content at scale, assisting educators in creating practice tasks or synthesizing data for training educational models. However, LLMs suffer from the ``Artificial Hivemind'' effect, where they produce homogeneous content. This homogeneity limits the diversity of LLM-generated tasks, a crucial factor in these educational settings. In this paper, we investigate how to increase the diversity of generated tasks while keeping their utility high. Inspired by the divergent--convergent thinking stages in creativity literature, we propose a prompting framework with two reasoning stages: (1) exploring the creative space, and (2) satisfying the input requirements. We evaluate CreativeDC, a method instantiated from this framework in the domain of Python programming, using both automated metrics and expert evaluation. Results show that CreativeDC produces significantly more distinct high-utility tasks (about $1.6\times$) than baselines. Our work offers an effective approach for generating and evaluating more diverse tasks at scale.
翻译:暂无翻译