Machine learning is vital in high-stakes domains, yet conventional validation methods rely on averaging metrics like mean squared error (MSE) or mean absolute error (MAE), which fail to quantify extreme errors. Worst-case prediction failures can have substantial consequences, but current frameworks lack statistical foundations for assessing their probability. In this work a new statistical framework, based on Extreme Value Theory (EVT), is presented that provides a rigorous approach to estimating worst-case failures. Applying EVT to synthetic and real-world datasets, this method is shown to enable robust estimation of catastrophic failure probabilities, overcoming the fundamental limitations of standard cross-validation. This work establishes EVT as a fundamental tool for assessing model reliability, ensuring safer AI deployment in new technologies where uncertainty quantification is central to decision-making or scientific analysis.
翻译:机器学习在高风险领域中至关重要,然而传统的验证方法依赖于均方误差(MSE)或平均绝对误差(MAE)等平均指标,这些指标无法量化极端错误。最坏情况下的预测失败可能带来严重后果,但现有框架缺乏评估其概率的统计基础。本研究提出了一种基于极值理论(EVT)的新型统计框架,为估计最坏情况失败提供了严格的方法。将EVT应用于合成及真实数据集的结果表明,该方法能够实现对灾难性故障概率的稳健估计,克服了标准交叉验证的根本性局限。本工作确立了EVT作为评估模型可靠性的基础工具,为不确定性量化在决策或科学分析中至关重要的新技术领域,提供了更安全的人工智能部署保障。