Arithmetic puzzle games provide a controlled setting for studying difficulty in mathematical reasoning tasks, a core challenge in adaptive learning systems. We investigate the structural determinants of difficulty in a class of integer arithmetic puzzles inspired by number games. We formalize the problem and develop an exact dynamic-programming solver that enumerates reachable targets, extracts minimal-operation witnesses, and enables large-scale labeling. Using this solver, we construct a dataset of over 3.4 million instances and define difficulty via the minimum number of operations required to reach a target. We analyze the relationship between difficulty and solver-derived features. While baseline machine learning models based on bag- and target-level statistics can partially predict solvability, they fail to reliably distinguish easy instances. In contrast, we show that difficulty is fully determined by a small set of interpretable structural attributes derived from exact witnesses. In particular, the number of input values used in a minimal construction serves as a minimal sufficient statistic for difficulty under this labeling. These results provide a transparent, computationally grounded account of puzzle difficulty that bridges symbolic reasoning and data-driven modeling. The framework supports explainable difficulty estimation and principled task sequencing, with direct implications for adaptive arithmetic learning and intelligent practice systems.
翻译:算术谜题为研究数学推理任务中的难度(自适应学习系统的核心挑战)提供了受控环境。我们探究了一类受数字游戏启发的整数算术谜题中难度的结构性决定因素。我们对该问题进行了形式化,并开发了一种精确的动态规划求解器,该求解器可枚举可达目标、提取最少操作佐证,并支持大规模标注。利用该求解器,我们构建了一个包含超过340万个实例的数据集,并通过到达目标所需的最小操作数定义难度。我们分析了难度与求解器衍生特征之间的关系。虽然基于问题集合与目标层面统计的基线机器学习模型能部分预测可解性,但它们无法可靠区分简单实例。相反,我们证明难度完全由一组从精确佐证中推导出的可解释结构属性决定。特别地,在最小构造中使用的输入值数量构成了该标注下难度的最小充分统计量。这些结果为谜题难度提供了一种透明且基于计算的可解释说明,架起了符号推理与数据驱动建模之间的桥梁。该框架支持可解释的难度估计与有原则的任务排序,对自适应算术学习与智能练习系统具有直接意义。