Reliable uncertainty quantification (UQ) in machine learning (ML) regression tasks is becoming the focus of many studies in materials and chemical science. It is now well understood that average calibration is insufficient, and most studies implement additional methods testing the conditional calibration with respect to uncertainty, i.e. consistency. Consistency is assessed mostly by so-called reliability diagrams. There exists however another way beyond average calibration, which is conditional calibration with respect to input features, i.e. adaptivity. In practice, adaptivity is the main concern of the final users of a ML-UQ method, seeking for the reliability of predictions and uncertainties for any point in features space. This article aims to show that consistency and adaptivity are complementary validation targets, and that a good consistency does not imply a good adaptivity. Adapted validation methods are proposed and illustrated on a representative example.
翻译:在材料与化学科学领域,机器学习回归任务中可靠的不确定性量化正成为众多研究的焦点。目前学界已普遍认识到,平均校准并不足够,大多数研究还需额外实施针对不确定性的条件校准(即一致性)测试方法。一致性主要通过所谓的可靠性图进行评估。然而,除平均校准外,还存在另一种方法——针对输入特征的条件校准(即自适应性)。在实践中,自适应性是机器学习不确定性量化方法最终用户的主要关切点,他们寻求特征空间中任意点预测结果与不确定性的可靠性。本文旨在证明一致性与自适应性是互补的验证目标,且良好的一致性并不保证良好的自适应性。本文提出相应的验证方法,并通过代表性示例加以说明。