Generative artificial intelligence (AI) has found a widespread use in computing education; at the same time, quality of generated materials raises concerns among educators and students. This study addresses this issue by introducing a novel method for diagram code generation with in-context examples based on the Rhetorical Structure Theory (RST), which aims to improve diagram generation by aligning models' output with user expectations. Our approach is evaluated by computer science educators, who assessed 150 diagrams generated with large language models (LLMs) for logical organization, connectivity, layout aesthetic, and AI hallucination. The assessment dataset is additionally investigated for its utility in automated diagram evaluation. The preliminary results suggest that our method decreases the rate of factual hallucination and improves diagram faithfulness to provided context; however, due to LLMs' stochasticity, the quality of the generated diagrams varies. Additionally, we present an in-depth analysis and discussion on the connection between AI hallucination and the quality of generated diagrams, which reveals that text contexts of higher complexity lead to higher rates of hallucination and LLMs often fail to detect mistakes in their output.
翻译:生成式人工智能在计算教育领域已得到广泛应用;与此同时,生成材料的质量引发了教育工作者和学生的担忧。本研究通过引入一种基于修辞结构理论(RST)的、结合上下文示例的图表代码生成新方法,旨在通过使模型输出与用户期望对齐来改进图表生成。我们的方法由计算机科学教育工作者进行评估,他们针对逻辑组织性、连接性、布局美观度及人工智能幻觉四个维度,评估了由大语言模型(LLMs)生成的150幅图表。该评估数据集还进一步探究了其在自动化图表评估中的实用性。初步结果表明,我们的方法降低了事实性幻觉的发生率,并提升了生成图表与所提供上下文的忠实度;然而,由于LLMs的随机性,生成图表的质量存在波动。此外,我们深入分析和讨论了人工智能幻觉与生成图表质量之间的关联,揭示出文本上下文复杂度越高,幻觉发生率也越高,且LLMs常常无法检测其输出中的错误。