Interactive assessments generate sequential process data that are not well handled by conventional item response models. Existing MDP-based measurement approaches, such as the Markov decision process measurement model (MDP-MM, LaMar, 2018), link action choices to state-action values, but their reliance on person-specific tabular value functions makes them difficult to scale beyond small, fully enumerated tasks. We propose the Reinforcement Learning Measurement Model (RLMM), a measurement framework that decouples person-level choice sensitivity from task-level value representation through a shared parametric action-value function, making estimation more computationally efficient for larger process-data settings. The model combines a Boltzmann choice rule with normalized advantages, a soft Bellman consistency penalty, and a block-coordinate MAP procedure for joint estimation, while also yielding step-level influence diagnostics for identifying behaviorally critical decisions. In peg-solitaire simulations, the RLMM achieved higher estimation accuracy and substantially lower runtime than the original MDP-MM, with advantages increasing as task complexity grew. In AQUALAB gameplay logs, the estimated person parameter was positively associated with cumulative reward, task completion, and behavioral efficiency. These results show that the RLMM extends decision-process-based psychometric models to larger and more behaviorally realistic environments while preserving an interpretable latent trait tied to decision making steps.
翻译:交互式评估生成的序贯过程数据是传统项目反应模型难以处理的。现有的基于马尔可夫决策过程(MDP)的测量方法,如马尔可夫决策过程测量模型(MDP-MM, LaMar, 2018),将行动选择与状态-行动价值关联起来,但其依赖于个体特定的表格化价值函数,使其难以扩展到除小型、完全枚举任务之外的场景。我们提出强化学习测量模型(RLMM),这是一种通过共享参数化行动-价值函数将个体水平的选择敏感性与任务水平的价值表示解耦的测量框架,从而在更大规模的过程数据场景中实现更高效的参数估计。该模型结合了玻尔兹曼选择规则与归一化优势、软贝尔曼一致性惩罚以及用于联合估计的块坐标最大后验(MAP)程序,同时提供步骤级的诊断影响指标,用于识别行为关键决策。在孔明棋模拟中,RLMM相比原始MDP-MM实现了更高的估计精度和显著更低的运行时间,且随着任务复杂度增加其优势更加明显。在AQUALAB游戏日志中,估计的个体参数与累积奖励、任务完成度及行为效率呈正相关。这些结果表明,RLMM将基于决策过程的心理测量模型推广到更大规模且更符合行为现实的场景中,同时保留了与决策步骤相关的可解释潜特质。