Testing earthquake forecasts is essential to obtain scientific information on forecasting models and sufficient credibility for societal usage. We aim at enhancing the testing phase proposed by the Collaboratory for the Study of Earthquake Predictability (CSEP, Schorlemmer et al., 2018) with new statistical methods supported by mathematical theory. To demonstrate their applicability, we evaluate three short-term forecasting models that were submitted to the CSEP-Italy experiment, and two ensemble models thereof. The models produce weekly overlapping forecasts for the expected number of M4+ earthquakes in a collection of grid cells. We compare the models' forecasts using consistent scoring functions for means or expectations, which are widely used and theoretically principled tools for forecast evaluation. We further discuss and demonstrate their connection to CSEP-style earthquake likelihood model testing, and specifically suggest an improvement of the T-test. Then, using tools from isotonic regression, we investigate forecast reliability and apply score decompositions in terms of calibration and discrimination. Our results show where and how models outperform their competitors and reveal a substantial lack of calibration for various models. The proposed methods also apply to full-distribution (e.g., catalog-based) forecasts, without requiring Poisson distributions or making any other type of parametric assumption.
翻译:检验地震预测对于获取预测模型的科学信息以及确保社会应用的可信度至关重要。本研究旨在利用数学理论支持的新型统计方法,增强由地震可预测性研究合作实验室(CSEP,Schorlemmer等人,2018年)提出的检验阶段。为论证其适用性,我们评估了提交至CSEP-意大利实验的三个短期预测模型及其两个集成模型。这些模型针对网格单元集合中M4+地震的预期数量生成每周重叠预测。我们采用均值或期望的一致评分函数(一种广泛使用且具有理论原则的预测评估工具)来比较模型预测。我们进一步讨论并论证了这些方法与CSEP式地震似然模型检验的关联,特别提出了对T检验的改进方案。随后,利用等渗回归工具,我们研究了预测的可靠性,并基于校准与辨别能力进行了评分分解。研究结果揭示了模型在何时何地优于竞争对手,并指出多种模型存在显著的校准不足问题。所提出的方法同样适用于全分布(例如基于目录的)预测,且无需假设泊松分布或任何其他参数化假设。