Every 4 years, the PISA test is administered by the OECD to test the knowledge of teenage students worldwide and allow for comparisons of educational systems. However, having to avoid language differences and annotator bias makes the grading of student answers challenging. For these reasons, it would be interesting to compare methods of automatic student answer grading. To train some of these methods, which require machine learning, or to compute parameters or select hyperparameters for those that do not, a large amount of domain-specific data is needed. In this work, we explore a small number of methods for creating a large-scale training dataset using only a relatively small confidential dataset as a reference, leveraging a set of very simple derived text formats to preserve confidentiality. Using these methods, we successfully created three surrogate datasets that are, at the very least, superficially more similar to the reference dataset than purely the result of prompt-based generation. Early experiments suggest one of these approaches might also lead to improved model training.
翻译:每四年,OECD会组织PISA测试,用于评估全球青少年学生的知识水平,并比较各国教育体系的差异。然而,为避免语言差异和标注者偏差,学生答案的评分工作极具挑战性。因此,比较自动学生答案评分方法具有重要研究价值。为了训练需要机器学习的评分方法,或为非机器学习方法计算参数、选择超参数,需要大量的领域特定数据。本研究探索了若干种方法,仅以一份相对较小的机密数据集为参考,利用一系列极为简单的派生文本格式来保护数据机密性,从而生成大规模训练数据集。通过这些方法,我们成功创建了三个替代数据集,至少从表面上看,它们比仅依赖提示生成的结果更接近原始参考数据集。初步实验表明,其中一种方法还有望提升模型训练效果。