Probability predictions are essential to inform decision making in medicine, economics, image classification, sports analytics, entertainment, and many other fields. Ideally, probability predictions are (i) well calibrated, (ii) accurate, and (iii) bold, i.e., far from the base rate of the event. Predictions that satisfy these three criteria are informative for decision making. However, there is a fundamental tension between calibration and boldness, since calibration metrics can be high when predictions are overly cautious, i.e., non-bold. The purpose of this work is to develop a hypothesis test and Bayesian model selection approach to assess calibration, and a strategy for boldness-recalibration that enables practitioners to responsibly embolden predictions subject to their required level of calibration. Specifically, we allow the user to pre-specify their desired posterior probability of calibration, then maximally embolden predictions subject to this constraint. We verify the performance of our procedures via simulation, then demonstrate the breadth of applicability by applying these methods to real world case studies in each of the fields mentioned above. We find that very slight relaxation of calibration probability (e.g., from 0.99 to 0.95) can often substantially embolden predictions (e.g., widening Hockey predictions' range from .25-.75 to .10-.90)
翻译:概率预测对于医学、经济学、图像分类、体育分析、娱乐等众多领域的决策制定至关重要。理想情况下,概率预测应满足:(i) 良好校准,(ii) 准确,以及 (iii) 大胆,即远离事件的基准率。满足这三个条件的预测能为决策提供有价值的信息。然而,校准与大胆度之间存在根本性冲突,因为当预测过于谨慎(即非大胆)时,校准指标可能很高。本文旨在开发一种用于评估校准的假设检验与贝叶斯模型选择方法,以及一种大胆度-再校准策略,使实践者能在满足所需校准水平的前提下负责任地增强预测的大胆度。具体而言,我们允许用户预先指定所需的校准后验概率,然后在满足该约束的条件下最大化预测的大胆度。我们通过仿真验证了所提方法的性能,并通过将方法应用于上述各领域的真实案例研究展示了其广泛适用性。研究发现,轻微放宽校准概率(例如从0.99降至0.95)通常能显著增强预测的大胆度(例如,将曲棍球预测的范围从0.25-0.75扩展至0.10-0.90)。