AutoIRT：基于自动化机器学习校准项目反应理论模型 (AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning)

Item response theory (IRT) is a class of interpretable factor models that are widely used in computerized adaptive tests (CATs), such as language proficiency tests. Traditionally, these are fit using parametric mixed effects models on the probability of a test taker getting the correct answer to a test item (i.e., question). Neural net extensions of these models, such as BertIRT, require specialized architectures and parameter tuning. We propose a multistage fitting procedure that is compatible with out-of-the-box Automated Machine Learning (AutoML) tools. It is based on a Monte Carlo EM (MCEM) outer loop with a two stage inner loop, which trains a non-parametric AutoML grade model using item features followed by an item specific parametric model. This greatly accelerates the modeling workflow for scoring tests. We demonstrate its effectiveness by applying it to the Duolingo English Test, a high stakes, online English proficiency test. We show that the resulting model is typically more well calibrated, gets better predictive performance, and more accurate scores than existing methods (non-explanatory IRT models and explanatory IRT models like BERT-IRT). Along the way, we provide a brief survey of machine learning methods for calibration of item parameters for CATs.

翻译：项目反应理论（IRT）是一类可解释的因子模型，广泛应用于计算机化自适应测试（CAT），如语言能力测试。传统上，这些模型通过参数化混合效应模型拟合考生对测试项目（即试题）的正确回答概率。此类模型的神经网络扩展（如BertIRT）需要专用架构和参数调优。我们提出一种与开箱即用的自动化机器学习（AutoML）工具兼容的多阶段拟合流程。该方法基于蒙特卡洛期望最大化（MCEM）外循环和两阶段内循环：首先利用项目特征训练非参数化AutoML评分模型，随后训练项目特定的参数化模型。这显著加速了测试评分的建模工作流程。我们通过将其应用于高风险在线英语能力测试——多邻国英语测试，验证了该方法的有效性。结果表明，相较于现有方法（非解释性IRT模型及BERT-IRT等解释性IRT模型），所得模型通常具有更好的校准性、更优的预测性能和更准确的评分。此外，本文还简要综述了用于CAT项目参数校准的机器学习方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日