Probability estimation models play an important role in various fields, such as weather forecasting, recommendation systems, and sports analysis. Among several models estimating probabilities, it is difficult to evaluate which model gives reliable probabilities since the ground-truth probabilities are not available. The win probability estimation model for esports, which calculates the win probability under a certain game state, is also one of the fields being actively studied in probability estimation. However, most of the previous works evaluated their models using accuracy, a metric that only can measure the performance of discrimination. In this work, we firstly investigate the Brier score and the Expected Calibration Error (ECE) as a replacement of accuracy used as a performance evaluation metric for win probability estimation models in esports field. Based on the analysis, we propose a novel metric called Balance score which is a simple yet effective metric in terms of six good properties that probability estimation metric should have. Under the general condition, we also found that the Balance score can be an effective approximation of the true expected calibration error which has been imperfectly approximated by ECE using the binning technique. Extensive evaluations using simulation studies and real game snapshot data demonstrate the promising potential to adopt the proposed metric not only for the win probability estimation model for esports but also for evaluating general probability estimation models.
翻译:概率估计模型在天气预报、推荐系统和体育分析等多个领域发挥着重要作用。在多种概率估计模型中,由于真实概率不可得,评估哪个模型能提供可靠概率十分困难。电子竞技中的获胜概率估计模型(即在特定游戏状态下计算获胜概率的模型)也是概率估计领域中活跃的研究方向之一。然而,以往大多数工作均采用准确率作为评估指标,而准确率仅能衡量模型的判别性能。本研究首次探讨了布赖尔分数(Brier score)和期望校准误差(Expected Calibration Error, ECE)作为替代准确率的性能评估指标,用于电子竞技领域的获胜概率估计模型。基于分析,我们提出了一种名为平衡分数(Balance score)的新指标,该指标在概率估计指标应具备的六个优良性质方面既简单又有效。在一般条件下,我们还发现平衡分数可以成为真实期望校准误差的有效近似,而基于分箱技术的ECE对此真实值存在近似缺陷。通过模拟研究和真实游戏快照数据的大量评估,证明了所提指标不仅适用于电子竞技获胜概率估计模型,也适用于评估一般概率估计模型的巨大潜力。