With an increasing focus in STEM education on critical thinking skills, science writing plays an ever more important role in curricula that stress inquiry skills. A recently published dataset of two sets of college level lab reports from an inquiry-based physics curriculum relies on analytic assessment rubrics that utilize multiple dimensions, specifying subject matter knowledge and general components of good explanations. Each analytic dimension is assessed on a 6-point scale, to provide detailed feedback to students that can help them improve their science writing skills. Manual assessment can be slow, and difficult to calibrate for consistency across all students in large classes. While much work exists on automated assessment of open-ended questions in STEM subjects, there has been far less work on long-form writing such as lab reports. We present an end-to-end neural architecture that has separate verifier and assessment modules, inspired by approaches to Open Domain Question Answering (OpenQA). VerAs first verifies whether a report contains any content relevant to a given rubric dimension, and if so, assesses the relevant sentences. On the lab reports, VerAs outperforms multiple baselines based on OpenQA systems or Automated Essay Scoring (AES). VerAs also performs well on an analytic rubric for middle school physics essays.
翻译:随着STEM教育日益重视批判性思维技能,科学写作在强调探究技能的课程中发挥着愈发重要的作用。近期发布的一个数据集包含基于探究式物理课程的两组大学水平实验报告,该数据集采用多维度分析性评估量规,分别考察学科知识掌握程度及优秀解释的通用要素。每个分析维度按6分制评分,旨在为学生提供有助于提升科学写作技能的详细反馈。人工评估速度缓慢且难以校准大班全体学生成绩的一致性。尽管关于STEM学科开放式问题的自动评估已有大量研究,但针对实验报告等长篇写作的自动评估研究仍明显不足。受开放领域问答(OpenQA)方法启发,我们提出一种端到端神经网络架构,该架构包含独立的验证器与评估模块。VerAs首先验证报告是否包含与给定量规维度相关的内容,若包含则对相关句子进行评估。在实验报告数据集上,VerAs的表现优于基于OpenQA系统或自动作文评分(AES)的多个基线方法。此外,VerAs在中学物理论文的分析性量规评估中也展现出优异性能。