In the field of machine learning, model performance is usually assessed by randomly splitting data into training and test sets. Different random splits, however, can yield markedly different performance estimates, so a genuinely good model may be discarded or a poor one selected purely due to an unlucky partition. This motivates a principled way to diagnose the quality of a given data split. We propose a diagnostic framework based on a new discrepancy measure, the Mahalanobis Distribution Alignment Score (MDAS). MDAS is a symmetric dissimilarity measure between two multivariate samples, rather than a strict metric. MDAS captures both mean and covariance differences and is affine invariant. Building on this, we construct a Monte Carlo test that evaluates whether an observed split is statistically compatible with typical random splits, yielding an interpretable p-value for split quality. Using several real data sets, we study the relationship between MDAS and model robustness, including its association with the normalized Akaike information criterion. Finally, we apply MDAS to compare existing state-of-the-art deterministic data-splitting strategies with standard random splitting. The experimental results show that MDAS provides a simple, model-agnostic tool for auditing data splits and improving the reliability of empirical model evaluation.
翻译:在机器学习领域,模型性能通常通过将数据随机划分为训练集和测试集来评估。然而,不同的随机划分可能产生显著不同的性能估计,导致真正优秀的模型可能因划分不当而被弃用,或较差的模型被误选。这促使我们需要一种原则性方法来诊断给定数据划分的质量。本文提出一种基于新型差异度量——马氏分布对齐分数(MDAS)的诊断框架。MDAS 是两个多元样本间的对称相异性度量,而非严格意义上的距离度量。它能够同时捕捉均值与协方差差异,并具有仿射不变性。在此基础上,我们构建了蒙特卡洛检验,用于评估观测到的划分在统计上是否与典型随机划分相容,从而为划分质量生成可解释的 p 值。通过多个真实数据集,我们研究了 MDAS 与模型鲁棒性之间的关系,包括其与归一化赤池信息准则的关联。最后,我们应用 MDAS 比较现有最先进的确定性数据划分策略与标准随机划分。实验结果表明,MDAS 为审计数据划分及提升实证模型评估的可靠性提供了一种简单且与模型无关的工具。