Reliable uncertainty quantification (UQ) in machine learning (ML) regression tasks is becoming the focus of many studies in materials and chemical science. It is now well understood that average calibration is insufficient, and most studies implement additional methods testing the conditional calibration with respect to uncertainty, i.e. consistency. Consistency is assessed mostly by so-called reliability diagrams. There exists however another way beyond average calibration, which is conditional calibration with respect to input features, i.e. adaptivity. In practice, adaptivity is the main concern of the final users of a ML-UQ method, seeking for the reliability of predictions and uncertainties for any point in features space. This article aims to show that consistency and adaptivity are complementary validation targets, and that a good consistency does not imply a good adaptivity. Adapted validation methods are proposed and illustrated on a representative example.
翻译:机器学习(ML)回归任务中的可靠不确定性量化(UQ)正成为材料与化学科学领域众多研究的焦点。目前学界已明确认识到,平均校准是不够的,大多数研究还实施了额外方法以检验不确定性条件下的条件校准,即一致性。一致性主要通过所谓的可靠性图进行评估。然而,存在另一种超越平均校准的方法,即输入特征条件下的条件校准,即适应性。在实际应用中,适应性是机器学习不确定性量化方法最终用户的主要关注点,他们追求特征空间中任意点的预测与不确定性的可靠性。本文旨在表明,一致性和适应性是互补的验证目标,且良好的一致性并不意味着良好的适应性。我们提出了相应的验证方法,并通过一个代表性示例进行了说明。