Crash classification models in transportation safety are typically evaluated using accuracy, F1, or AUC, metrics that cannot reveal whether a model is silently overfitting. We introduce a spectral diagnostic framework grounded in Random Matrix Theory (RMT) and Heavy-Tailed Self-Regularization (HTSR) that spans the ML taxonomy: weight matrices for BERT/ALBERT/Qwen2.5, out-of-fold increment matrices for XGBoost/Random Forest, empirical Hessians for Logistic Regression, induced affinity matrices for Decision Trees, and Graph Laplacians for KNN. Evaluating nine model families on two Iowa DOT crash classification tasks (173,512 and 371,062 records respectively), we find that the power-law exponent $α$ provides a structural quality signal: well-regularized models consistently yield $α$ within $[2, 4]$ (mean $2.87 \pm 0.34$), while overfit variants show $α< 2$ or spectral collapse. We observe a strong rank correlation between $α$ and expert agreement (Spearman $ρ= 0.89$, $p < 0.001$), suggesting spectral quality captures model behaviors aligned with expert reasoning. We propose an $α$-based early stopping criterion and a spectral model selection protocol, and validate both against cross-validated F1 baselines. Sparse Lanczos approximations make the framework scalable to large datasets.
翻译:交通安全领域的碰撞分类模型通常采用准确率、F1值或AUC进行评估,但这些指标无法揭示模型是否存在隐性过拟合。本文提出一种基于随机矩阵理论与重尾自正则化的谱诊断框架,其覆盖机器学习分类体系中的多种结构:包括BERT/ALBERT/Qwen2.5的权重矩阵、XGBoost/随机森林的折外增量矩阵、逻辑回归的经验海森矩阵、决策树的诱导亲和矩阵以及KNN的图拉普拉斯矩阵。通过对爱荷华州交通部两项碰撞分类任务(分别包含173,512条和371,062条记录)上的九类模型族进行评估,我们发现幂律指数$α$能够提供结构质量信号:良好正则化的模型始终产生$[2, 4]$区间内的$α$值(均值$2.87 \pm 0.34$),而过拟合变体则呈现$α< 2$或谱崩溃现象。我们观察到$α$与专家评估一致性之间存在强等级相关性(斯皮尔曼$ρ= 0.89$,$p < 0.001$),表明谱质量能够捕捉符合专家推理逻辑的模型行为。我们提出了基于$α$的早停准则与谱模型选择协议,并通过交叉验证的F1基线对二者进行了验证。稀疏兰佐斯近似方法使该框架能够扩展至大规模数据集。