解耦学习与评判：开放式应答分析的表征学习方法 (Disentangling Learning from Judgment: Representation Learning for Open Response Analytics)

Open-ended responses are central to learning, yet automated scoring often conflates what students wrote with how teachers grade. We present an analytics-first framework that separates content signals from rater tendencies, making judgments visible and auditable via analytics. Using de-identified ASSISTments mathematics responses, we model teacher histories as dynamic priors and represent text with sentence embeddings. We apply centroid normalization and response-problem embedding differences, and explicitly model teacher effects with priors to reduce problem- and teacher-related confounds. Temporally-validated linear models quantify the contributions of each signal, and model disagreements surface observations for qualitative inspection. Results show that teacher priors heavily influence grade predictions; the strongest results arise when priors are combined with content embeddings (AUC~0.815), while content-only models remain above chance but substantially weaker (AUC~0.626). Adjusting for rater effects sharpens the selection of features derived from content representations, retaining more informative embedding dimensions and revealing cases where semantic evidence supports understanding as opposed to surface-level differences in how students respond. The contribution presents a practical pipeline that transforms embeddings from mere features into learning analytics for reflection, enabling teachers and researchers to examine where grading practices align (or conflict) with evidence of student reasoning and learning.

翻译：开放式应答是学习的核心环节，但自动评分往往将学生作答内容与教师评分倾向相混淆。本文提出一种分析优先的框架，将内容信号与评分者倾向分离，通过分析使评判过程可见且可审计。基于去标识化的ASSISTments数学应答数据，我们将教师历史建模为动态先验，并使用句子嵌入表示文本。通过应用质心归一化与应答-问题嵌入差异，并利用先验显式建模教师效应，以降低问题相关和教师相关的混杂因素。经时间验证的线性模型量化了各信号的贡献度，模型分歧为定性检验提供了可观察的实例。结果表明：教师先验对成绩预测具有显著影响；当先验与内容嵌入结合时获得最优结果（AUC~0.815），而纯内容模型虽高于随机水平但明显较弱（AUC~0.626）。校正评分者效应能锐化从内容表征中提取的特征选择，保留更具信息量的嵌入维度，并揭示语义证据支持理解（而非学生应答的表面差异）的案例。本研究的贡献在于提出了一套实用流程，将嵌入从单纯特征转化为可供反思的学习分析工具，使教师和研究者能够审视评分实践在何处与（或偏离）学生推理和学习的证据相一致。