Refactor Analysis: Predictive Evaluations of Factor Models and Dimensionality

Unidimensional factor models justify some of the most consequential summaries in science -- single scores, single ranks, and single leaderboards -- yet unidimensionality is usually assessed indirectly by fitting and evaluating models on images of the data (e.g., correlation matrices) rather than on the response matrix itself. We introduce Refactor analysis, a data-first evaluation paradigm that converts a one-factor solution into a rank-1 prediction of the original matrix by estimating both respondent- and item-side structure from dual association images. We further introduce Verifactor analysis, which evaluates the same construction under bi-cross-validated (BCV) row-column partitions for improved generalization. In simulations where the data-generating mechanism is truly rank-1 and correlational, Refactor metrics align with classical unidimensionality indices, validating the approach. However, across 200 public dichotomous datasets, traditional fit and unidimensionality measures, though highly intercorrelated, are weakly related to data recoverability, especially out of sample. This gap exposes a methodological vulnerability: excellent image-based fit can coexist with poor data-level explanatory power. Finally, treating the association measure itself as a testable hypothesis, we compare $φ$, tetrachoric, and quadrant correlation, $q^\prime$, an important reintroduction. Quadrant correlation emerges as a simple, interpretable, and remarkably robust alternative, yielding consistently stronger reconstruction and more stable behavior under sample-size variation than commonly used correlations. Together, Refactor and Verifactor shift unidimensionality assessment from "does a one-factor model fit the correlation matrix?" to the question that matters for measurement and benchmarking: does a one-factor dependence structure recover and generalize the observed responses?

翻译：单维因子模型支撑着科学中最具影响力的若干总结——单一分数、单一排名和单一排行榜——然而单维性通常是通过在数据影像（例如相关矩阵）上拟合和评估模型来间接衡量，而非直接基于响应矩阵本身。我们引入**重构分析**（Refactor analysis），这是一种数据优先的评估范式，通过从双关联影像中估计应答方和项目侧结构，将单因子解转化为原始矩阵的秩1预测。进一步，我们引入**验证因子分析**（Verifactor analysis），该分析在双交叉验证的行列分区下评估同一结构，以提升泛化性能。在数据生成机制确为秩1且基于相关性的模拟中，重构指标与经典单维性指数一致，验证了该方法。然而，在200个公开二分数据集中，传统拟合度和单维性指标虽高度相关，却与数据可恢复性（尤其是样本外）关联微弱。这一差距揭示了方法论上的脆弱性：基于影像的优异拟合可能与低层次的数据解释力共存。最后，将关联度量本身视为可检验假设，我们比较了φ系数、四分相关系数以及象限相关系数q′——一种重要的重新引入。象限相关系数作为一种简单、可解释且异常稳健的替代方法，在重构强度和样本量变化下的稳定性方面均优于常用相关系数。重构分析与验证因子分析共同将单维性评估从“单因子模型是否拟合相关矩阵？”转向对测量和基准化至关重要的核心问题：单因子依赖结构能否恢复并泛化观测到的响应？